aboutsummaryrefslogtreecommitdiff
path: root/Documentation
diff options
context:
space:
mode:
authorDmitry Torokhov <dmitry.torokhov@gmail.com>2009-09-13 21:16:56 -0700
committerDmitry Torokhov <dmitry.torokhov@gmail.com>2009-09-13 21:16:56 -0700
commitfc8e1ead9314cf0e0f1922e661428b93d3a50d88 (patch)
treef3cb97c4769b74f6627a59769f1ed5c92a13c58a /Documentation
parent2bcaa6a4238094c5695d5b1943078388d82d3004 (diff)
parent9de48cc300fb10f7d9faa978670becf5e352462a (diff)
Merge branch 'next' into for-linus
Diffstat (limited to 'Documentation')
-rw-r--r--Documentation/ABI/testing/sysfs-block68
-rw-r--r--Documentation/ABI/testing/sysfs-bus-pci7
-rw-r--r--Documentation/ABI/testing/sysfs-bus-pci-devices-cciss33
-rw-r--r--Documentation/ABI/testing/sysfs-class-mtd125
-rw-r--r--Documentation/ABI/testing/sysfs-devices-cache_disable18
-rw-r--r--Documentation/ABI/testing/sysfs-fs-ext410
-rw-r--r--Documentation/ABI/testing/sysfs-pps73
-rw-r--r--Documentation/Changes26
-rw-r--r--Documentation/CodingStyle4
-rw-r--r--Documentation/DMA-API.txt16
-rw-r--r--Documentation/DocBook/Makefile3
-rw-r--r--Documentation/DocBook/debugobjects.tmpl2
-rw-r--r--Documentation/DocBook/kernel-hacking.tmpl4
-rw-r--r--Documentation/DocBook/mac80211.tmpl3
-rw-r--r--Documentation/DocBook/tracepoint.tmpl89
-rw-r--r--Documentation/PCI/pcieaer-howto.txt25
-rw-r--r--Documentation/RCU/rculist_nulls.txt9
-rw-r--r--Documentation/RCU/trace.txt102
-rw-r--r--Documentation/SM501.txt2
-rw-r--r--Documentation/Smack.txt20
-rw-r--r--Documentation/SubmitChecklist2
-rw-r--r--Documentation/SubmittingPatches82
-rw-r--r--Documentation/accounting/getdelays.c3
-rw-r--r--Documentation/arm/Samsung-S3C24XX/GPIO.txt10
-rw-r--r--Documentation/arm/memory.txt2
-rw-r--r--Documentation/atomic_ops.txt4
-rw-r--r--Documentation/block/biodoc.txt2
-rw-r--r--Documentation/block/data-integrity.txt4
-rw-r--r--Documentation/block/deadline-iosched.txt2
-rw-r--r--Documentation/braille-console.txt2
-rw-r--r--Documentation/cdrom/packet-writing.txt2
-rw-r--r--Documentation/cgroups/cpusets.txt12
-rw-r--r--Documentation/cgroups/memory.txt16
-rw-r--r--Documentation/connector/cn_test.c11
-rw-r--r--Documentation/connector/ucon.c2
-rw-r--r--Documentation/cpu-freq/cpu-drivers.txt2
-rw-r--r--Documentation/cpu-freq/governors.txt26
-rw-r--r--Documentation/cpu-freq/user-guide.txt1
-rw-r--r--Documentation/dell_rbu.txt4
-rw-r--r--Documentation/development-process/5.Posting31
-rw-r--r--Documentation/device-mapper/dm-log.txt54
-rw-r--r--Documentation/device-mapper/dm-queue-length.txt39
-rw-r--r--Documentation/device-mapper/dm-service-time.txt91
-rw-r--r--Documentation/driver-model/device.txt32
-rw-r--r--Documentation/driver-model/devres.txt2
-rw-r--r--Documentation/driver-model/driver.txt4
-rw-r--r--Documentation/dvb/get_dvb_firmware61
-rw-r--r--Documentation/edac.txt8
-rw-r--r--Documentation/fault-injection/fault-injection.txt70
-rw-r--r--Documentation/fb/sh7760fb.txt2
-rw-r--r--Documentation/fb/vesafb.txt2
-rw-r--r--Documentation/feature-removal-schedule.txt51
-rw-r--r--Documentation/filesystems/00-INDEX4
-rw-r--r--Documentation/filesystems/9p.txt3
-rw-r--r--Documentation/filesystems/Locking45
-rw-r--r--Documentation/filesystems/afs.txt26
-rw-r--r--Documentation/filesystems/autofs4-mount-control.txt2
-rw-r--r--Documentation/filesystems/caching/netfs-api.txt2
-rw-r--r--Documentation/filesystems/debugfs.txt158
-rw-r--r--Documentation/filesystems/ext2.txt2
-rw-r--r--Documentation/filesystems/ext4.txt10
-rw-r--r--Documentation/filesystems/fiemap.txt2
-rw-r--r--Documentation/filesystems/gfs2-glocks.txt2
-rw-r--r--Documentation/filesystems/gfs2.txt19
-rw-r--r--Documentation/filesystems/isofs.txt9
-rw-r--r--Documentation/filesystems/nfs-rdma.txt2
-rw-r--r--Documentation/filesystems/nilfs2.txt5
-rw-r--r--Documentation/filesystems/proc.txt272
-rw-r--r--Documentation/filesystems/sysfs-pci.txt2
-rw-r--r--Documentation/filesystems/sysfs.txt3
-rw-r--r--Documentation/filesystems/vfat.txt13
-rw-r--r--Documentation/firmware_class/README3
-rw-r--r--Documentation/futex-requeue-pi.txt131
-rw-r--r--Documentation/gcov.txt253
-rw-r--r--Documentation/gpio.txt2
-rw-r--r--Documentation/hwmon/f71882fg12
-rw-r--r--Documentation/hwmon/ibmaem2
-rw-r--r--Documentation/hwmon/sysfs-interface19
-rw-r--r--Documentation/hwmon/tmp40142
-rw-r--r--Documentation/hwmon/w83627ehf11
-rw-r--r--Documentation/i2c/busses/i2c-ocores17
-rw-r--r--Documentation/i2c/busses/i2c-viapro4
-rw-r--r--Documentation/i2c/instantiating-devices44
-rw-r--r--Documentation/i2c/writing-clients16
-rw-r--r--Documentation/ide/ide.txt2
-rw-r--r--Documentation/input/sentelic.txt475
-rw-r--r--Documentation/ioctl/ioctl-number.txt3
-rw-r--r--Documentation/isdn/00-INDEX44
-rw-r--r--Documentation/isdn/INTERFACE.CAPI94
-rw-r--r--Documentation/isdn/README.gigaset42
-rw-r--r--Documentation/ja_JP/SubmitChecklist2
-rw-r--r--Documentation/kbuild/kconfig.txt116
-rw-r--r--Documentation/kbuild/modules.txt2
-rw-r--r--Documentation/kdump/kdump.txt4
-rw-r--r--Documentation/kernel-parameters.txt151
-rw-r--r--Documentation/kmemcheck.txt773
-rw-r--r--Documentation/kmemleak.txt151
-rw-r--r--Documentation/kobject.txt2
-rw-r--r--Documentation/kprobes.txt6
-rw-r--r--Documentation/laptops/acer-wmi.txt2
-rw-r--r--Documentation/laptops/sony-laptop.txt2
-rw-r--r--Documentation/laptops/thinkpad-acpi.txt176
-rw-r--r--Documentation/leds-lp3944.txt50
-rw-r--r--Documentation/lguest/Makefile3
-rw-r--r--Documentation/lguest/lguest.c1651
-rw-r--r--Documentation/lguest/lguest.txt1
-rw-r--r--Documentation/local_ops.txt2
-rw-r--r--Documentation/lockdep-design.txt6
-rw-r--r--Documentation/memory-barriers.txt129
-rw-r--r--Documentation/memory-hotplug.txt8
-rw-r--r--Documentation/mn10300/ABI.txt2
-rw-r--r--Documentation/mtd/nand_ecc.txt12
-rw-r--r--Documentation/networking/6pack.txt2
-rw-r--r--Documentation/networking/bonding.txt6
-rw-r--r--Documentation/networking/can.txt237
-rw-r--r--Documentation/networking/dm9000.txt2
-rw-r--r--Documentation/networking/ieee802154.txt76
-rw-r--r--Documentation/networking/ip-sysctl.txt18
-rw-r--r--Documentation/networking/ipv6.txt37
-rw-r--r--Documentation/networking/l2tp.txt2
-rw-r--r--Documentation/networking/mac80211-injection.txt28
-rw-r--r--Documentation/networking/netdevices.txt2
-rw-r--r--Documentation/networking/operstates.txt3
-rw-r--r--Documentation/networking/packet_mmap.txt140
-rw-r--r--Documentation/networking/phonet.txt2
-rw-r--r--Documentation/networking/regulatory.txt2
-rw-r--r--Documentation/power/devices.txt34
-rw-r--r--Documentation/power/regulator/consumer.txt2
-rw-r--r--Documentation/power/regulator/overview.txt2
-rw-r--r--Documentation/power/s2ram.txt2
-rw-r--r--Documentation/power/userland-swsusp.txt2
-rw-r--r--Documentation/powerpc/booting-without-of.txt1168
-rw-r--r--Documentation/powerpc/dts-bindings/4xx/emac.txt148
-rw-r--r--Documentation/powerpc/dts-bindings/can/sja1000.txt53
-rw-r--r--Documentation/powerpc/dts-bindings/ecm.txt64
-rw-r--r--Documentation/powerpc/dts-bindings/fsl/board.txt2
-rw-r--r--Documentation/powerpc/dts-bindings/fsl/cpm_qe/cpm.txt2
-rw-r--r--Documentation/powerpc/dts-bindings/fsl/cpm_qe/gpio.txt2
-rw-r--r--Documentation/powerpc/dts-bindings/fsl/cpm_qe/qe.txt3
-rw-r--r--Documentation/powerpc/dts-bindings/fsl/esdhc.txt7
-rw-r--r--Documentation/powerpc/dts-bindings/fsl/mcm.txt64
-rw-r--r--Documentation/powerpc/dts-bindings/fsl/msi-pic.txt2
-rw-r--r--Documentation/powerpc/dts-bindings/fsl/pmc.txt4
-rw-r--r--Documentation/powerpc/dts-bindings/gpio/gpio.txt50
-rw-r--r--Documentation/powerpc/dts-bindings/gpio/led.txt17
-rw-r--r--Documentation/powerpc/dts-bindings/gpio/mdio.txt19
-rw-r--r--Documentation/powerpc/dts-bindings/marvell.txt521
-rw-r--r--Documentation/powerpc/dts-bindings/phy.txt25
-rw-r--r--Documentation/powerpc/dts-bindings/spi-bus.txt57
-rw-r--r--Documentation/powerpc/dts-bindings/usb-ehci.txt25
-rw-r--r--Documentation/powerpc/dts-bindings/xilinx.txt295
-rw-r--r--Documentation/powerpc/qe_firmware.txt2
-rw-r--r--Documentation/pps/pps.txt172
-rw-r--r--Documentation/rbtree.txt10
-rw-r--r--Documentation/rfkill.txt640
-rw-r--r--Documentation/robust-futex-ABI.txt4
-rw-r--r--Documentation/s390/Debugging390.txt4
-rw-r--r--Documentation/scheduler/sched-nice-design.txt2
-rw-r--r--Documentation/scheduler/sched-rt-group.txt33
-rw-r--r--Documentation/scsi/aic79xx.txt2
-rw-r--r--Documentation/scsi/ncr53c8xx.txt4
-rw-r--r--Documentation/scsi/scsi_fc_transport.txt14
-rw-r--r--Documentation/scsi/scsi_mid_low_api.txt5
-rw-r--r--Documentation/scsi/sym53c8xx_2.txt2
-rw-r--r--Documentation/sound/alsa/ALSA-Configuration.txt38
-rw-r--r--Documentation/sound/alsa/HD-Audio-Models.txt20
-rw-r--r--Documentation/sound/alsa/HD-Audio.txt2
-rw-r--r--Documentation/sound/alsa/Procfile.txt41
-rw-r--r--Documentation/sound/alsa/README.maya44163
-rw-r--r--Documentation/sound/alsa/hda_codec.txt2
-rw-r--r--Documentation/sound/alsa/soc/dapm.txt1
-rw-r--r--Documentation/spi/spidev_test.c10
-rw-r--r--Documentation/sysctl/kernel.txt11
-rw-r--r--Documentation/sysctl/vm.txt27
-rw-r--r--Documentation/sysrq.txt7
-rw-r--r--Documentation/timers/hpet.txt2
-rw-r--r--Documentation/timers/timer_stats.txt2
-rw-r--r--Documentation/trace/events.txt90
-rw-r--r--Documentation/trace/ftrace.txt252
-rw-r--r--Documentation/trace/kmemtrace.txt2
-rw-r--r--Documentation/trace/mmiotrace.txt26
-rw-r--r--Documentation/trace/power.txt17
-rw-r--r--Documentation/usb/WUSB-Design-overview.txt8
-rw-r--r--Documentation/usb/anchors.txt4
-rw-r--r--Documentation/usb/callbacks.txt2
-rw-r--r--Documentation/video4linux/CARDLIST.cx238855
-rw-r--r--Documentation/video4linux/CARDLIST.cx888
-rw-r--r--Documentation/video4linux/CARDLIST.em28xx12
-rw-r--r--Documentation/video4linux/CARDLIST.saa713424
-rw-r--r--Documentation/video4linux/CARDLIST.tuner2
-rw-r--r--Documentation/video4linux/cx18.txt2
-rw-r--r--Documentation/video4linux/gspca.txt44
-rw-r--r--Documentation/video4linux/pxa_camera.txt49
-rw-r--r--Documentation/video4linux/v4l2-framework.txt29
-rw-r--r--Documentation/vm/Makefile2
-rw-r--r--Documentation/vm/balance18
-rw-r--r--Documentation/vm/page-types.c698
-rw-r--r--Documentation/vm/pagemap.txt68
-rw-r--r--Documentation/watchdog/hpwdt.txt95
-rw-r--r--Documentation/x86/00-INDEX2
-rw-r--r--Documentation/x86/boot.txt122
-rw-r--r--Documentation/x86/exception-tables.txt (renamed from Documentation/exception.txt)202
-rw-r--r--Documentation/x86/x86_64/boot-options.txt49
-rw-r--r--Documentation/x86/x86_64/machinecheck8
-rw-r--r--Documentation/x86/x86_64/mm.txt9
205 files changed, 8906 insertions, 3613 deletions
diff --git a/Documentation/ABI/testing/sysfs-block b/Documentation/ABI/testing/sysfs-block
index 44f52a4f590..5f3bedaf8e3 100644
--- a/Documentation/ABI/testing/sysfs-block
+++ b/Documentation/ABI/testing/sysfs-block
@@ -60,3 +60,71 @@ Description:
Indicates whether the block layer should automatically
generate checksums for write requests bound for
devices that support receiving integrity metadata.
+
+What: /sys/block/<disk>/alignment_offset
+Date: April 2009
+Contact: Martin K. Petersen <martin.petersen@oracle.com>
+Description:
+ Storage devices may report a physical block size that is
+ bigger than the logical block size (for instance a drive
+ with 4KB physical sectors exposing 512-byte logical
+ blocks to the operating system). This parameter
+ indicates how many bytes the beginning of the device is
+ offset from the disk's natural alignment.
+
+What: /sys/block/<disk>/<partition>/alignment_offset
+Date: April 2009
+Contact: Martin K. Petersen <martin.petersen@oracle.com>
+Description:
+ Storage devices may report a physical block size that is
+ bigger than the logical block size (for instance a drive
+ with 4KB physical sectors exposing 512-byte logical
+ blocks to the operating system). This parameter
+ indicates how many bytes the beginning of the partition
+ is offset from the disk's natural alignment.
+
+What: /sys/block/<disk>/queue/logical_block_size
+Date: May 2009
+Contact: Martin K. Petersen <martin.petersen@oracle.com>
+Description:
+ This is the smallest unit the storage device can
+ address. It is typically 512 bytes.
+
+What: /sys/block/<disk>/queue/physical_block_size
+Date: May 2009
+Contact: Martin K. Petersen <martin.petersen@oracle.com>
+Description:
+ This is the smallest unit a physical storage device can
+ write atomically. It is usually the same as the logical
+ block size but may be bigger. One example is SATA
+ drives with 4KB sectors that expose a 512-byte logical
+ block size to the operating system. For stacked block
+ devices the physical_block_size variable contains the
+ maximum physical_block_size of the component devices.
+
+What: /sys/block/<disk>/queue/minimum_io_size
+Date: April 2009
+Contact: Martin K. Petersen <martin.petersen@oracle.com>
+Description:
+ Storage devices may report a granularity or preferred
+ minimum I/O size which is the smallest request the
+ device can perform without incurring a performance
+ penalty. For disk drives this is often the physical
+ block size. For RAID arrays it is often the stripe
+ chunk size. A properly aligned multiple of
+ minimum_io_size is the preferred request size for
+ workloads where a high number of I/O operations is
+ desired.
+
+What: /sys/block/<disk>/queue/optimal_io_size
+Date: April 2009
+Contact: Martin K. Petersen <martin.petersen@oracle.com>
+Description:
+ Storage devices may report an optimal I/O size, which is
+ the device's preferred unit for sustained I/O. This is
+ rarely reported for disk drives. For RAID arrays it is
+ usually the stripe width or the internal track size. A
+ properly aligned multiple of optimal_io_size is the
+ preferred request size for workloads where sustained
+ throughput is desired. If no optimal I/O size is
+ reported this file contains 0.
diff --git a/Documentation/ABI/testing/sysfs-bus-pci b/Documentation/ABI/testing/sysfs-bus-pci
index 97ad190e13a..6bf68053e4b 100644
--- a/Documentation/ABI/testing/sysfs-bus-pci
+++ b/Documentation/ABI/testing/sysfs-bus-pci
@@ -122,3 +122,10 @@ Description:
This symbolic link appears when a device is a Virtual Function.
The symbolic link points to the PCI device sysfs entry of the
Physical Function this device associates with.
+
+What: /sys/bus/pci/slots/.../module
+Date: June 2009
+Contact: linux-pci@vger.kernel.org
+Description:
+ This symbolic link points to the PCI hotplug controller driver
+ module that manages the hotplug slot.
diff --git a/Documentation/ABI/testing/sysfs-bus-pci-devices-cciss b/Documentation/ABI/testing/sysfs-bus-pci-devices-cciss
new file mode 100644
index 00000000000..0a92a7c93a6
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-bus-pci-devices-cciss
@@ -0,0 +1,33 @@
+Where: /sys/bus/pci/devices/<dev>/ccissX/cXdY/model
+Date: March 2009
+Kernel Version: 2.6.30
+Contact: iss_storagedev@hp.com
+Description: Displays the SCSI INQUIRY page 0 model for logical drive
+ Y of controller X.
+
+Where: /sys/bus/pci/devices/<dev>/ccissX/cXdY/rev
+Date: March 2009
+Kernel Version: 2.6.30
+Contact: iss_storagedev@hp.com
+Description: Displays the SCSI INQUIRY page 0 revision for logical
+ drive Y of controller X.
+
+Where: /sys/bus/pci/devices/<dev>/ccissX/cXdY/unique_id
+Date: March 2009
+Kernel Version: 2.6.30
+Contact: iss_storagedev@hp.com
+Description: Displays the SCSI INQUIRY page 83 serial number for logical
+ drive Y of controller X.
+
+Where: /sys/bus/pci/devices/<dev>/ccissX/cXdY/vendor
+Date: March 2009
+Kernel Version: 2.6.30
+Contact: iss_storagedev@hp.com
+Description: Displays the SCSI INQUIRY page 0 vendor for logical drive
+ Y of controller X.
+
+Where: /sys/bus/pci/devices/<dev>/ccissX/cXdY/block:cciss!cXdY
+Date: March 2009
+Kernel Version: 2.6.30
+Contact: iss_storagedev@hp.com
+Description: A symbolic link to /sys/block/cciss!cXdY
diff --git a/Documentation/ABI/testing/sysfs-class-mtd b/Documentation/ABI/testing/sysfs-class-mtd
new file mode 100644
index 00000000000..4d55a188898
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-class-mtd
@@ -0,0 +1,125 @@
+What: /sys/class/mtd/
+Date: April 2009
+KernelVersion: 2.6.29
+Contact: linux-mtd@lists.infradead.org
+Description:
+ The mtd/ class subdirectory belongs to the MTD subsystem
+ (MTD core).
+
+What: /sys/class/mtd/mtdX/
+Date: April 2009
+KernelVersion: 2.6.29
+Contact: linux-mtd@lists.infradead.org
+Description:
+ The /sys/class/mtd/mtd{0,1,2,3,...} directories correspond
+ to each /dev/mtdX character device. These may represent
+ physical/simulated flash devices, partitions on a flash
+ device, or concatenated flash devices. They exist regardless
+ of whether CONFIG_MTD_CHAR is actually enabled.
+
+What: /sys/class/mtd/mtdXro/
+Date: April 2009
+KernelVersion: 2.6.29
+Contact: linux-mtd@lists.infradead.org
+Description:
+ These directories provide the corresponding read-only device
+ nodes for /sys/class/mtd/mtdX/ . They are only created
+ (for the benefit of udev) if CONFIG_MTD_CHAR is enabled.
+
+What: /sys/class/mtd/mtdX/dev
+Date: April 2009
+KernelVersion: 2.6.29
+Contact: linux-mtd@lists.infradead.org
+Description:
+ Major and minor numbers of the character device corresponding
+ to this MTD device (in <major>:<minor> format). This is the
+ read-write device so <minor> will be even.
+
+What: /sys/class/mtd/mtdXro/dev
+Date: April 2009
+KernelVersion: 2.6.29
+Contact: linux-mtd@lists.infradead.org
+Description:
+ Major and minor numbers of the character device corresponding
+ to the read-only variant of thie MTD device (in
+ <major>:<minor> format). In this case <minor> will be odd.
+
+What: /sys/class/mtd/mtdX/erasesize
+Date: April 2009
+KernelVersion: 2.6.29
+Contact: linux-mtd@lists.infradead.org
+Description:
+ "Major" erase size for the device. If numeraseregions is
+ zero, this is the eraseblock size for the entire device.
+ Otherwise, the MEMGETREGIONCOUNT/MEMGETREGIONINFO ioctls
+ can be used to determine the actual eraseblock layout.
+
+What: /sys/class/mtd/mtdX/flags
+Date: April 2009
+KernelVersion: 2.6.29
+Contact: linux-mtd@lists.infradead.org
+Description:
+ A hexadecimal value representing the device flags, ORed
+ together:
+
+ 0x0400: MTD_WRITEABLE - device is writable
+ 0x0800: MTD_BIT_WRITEABLE - single bits can be flipped
+ 0x1000: MTD_NO_ERASE - no erase necessary
+ 0x2000: MTD_POWERUP_LOCK - always locked after reset
+
+What: /sys/class/mtd/mtdX/name
+Date: April 2009
+KernelVersion: 2.6.29
+Contact: linux-mtd@lists.infradead.org
+Description:
+ A human-readable ASCII name for the device or partition.
+ This will match the name in /proc/mtd .
+
+What: /sys/class/mtd/mtdX/numeraseregions
+Date: April 2009
+KernelVersion: 2.6.29
+Contact: linux-mtd@lists.infradead.org
+Description:
+ For devices that have variable eraseblock sizes, this
+ provides the total number of erase regions. Otherwise,
+ it will read back as zero.
+
+What: /sys/class/mtd/mtdX/oobsize
+Date: April 2009
+KernelVersion: 2.6.29
+Contact: linux-mtd@lists.infradead.org
+Description:
+ Number of OOB bytes per page.
+
+What: /sys/class/mtd/mtdX/size
+Date: April 2009
+KernelVersion: 2.6.29
+Contact: linux-mtd@lists.infradead.org
+Description:
+ Total size of the device/partition, in bytes.
+
+What: /sys/class/mtd/mtdX/type
+Date: April 2009
+KernelVersion: 2.6.29
+Contact: linux-mtd@lists.infradead.org
+Description:
+ One of the following ASCII strings, representing the device
+ type:
+
+ absent, ram, rom, nor, nand, dataflash, ubi, unknown
+
+What: /sys/class/mtd/mtdX/writesize
+Date: April 2009
+KernelVersion: 2.6.29
+Contact: linux-mtd@lists.infradead.org
+Description:
+ Minimal writable flash unit size. This will always be
+ a positive integer.
+
+ In the case of NOR flash it is 1 (even though individual
+ bits can be cleared).
+
+ In the case of NAND flash it is one NAND page (or a
+ half page, or a quarter page).
+
+ In the case of ECC NOR, it is the ECC block size.
diff --git a/Documentation/ABI/testing/sysfs-devices-cache_disable b/Documentation/ABI/testing/sysfs-devices-cache_disable
new file mode 100644
index 00000000000..175bb4f7051
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-devices-cache_disable
@@ -0,0 +1,18 @@
+What: /sys/devices/system/cpu/cpu*/cache/index*/cache_disable_X
+Date: August 2008
+KernelVersion: 2.6.27
+Contact: mark.langsdorf@amd.com
+Description: These files exist in every cpu's cache index directories.
+ There are currently 2 cache_disable_# files in each
+ directory. Reading from these files on a supported
+ processor will return that cache disable index value
+ for that processor and node. Writing to one of these
+ files will cause the specificed cache index to be disabled.
+
+ Currently, only AMD Family 10h Processors support cache index
+ disable, and only for their L3 caches. See the BIOS and
+ Kernel Developer's Guide at
+ http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/31116-Public-GH-BKDG_3.20_2-4-09.pdf
+ for formatting information and other details on the
+ cache index disable.
+Users: joachim.deguara@amd.com
diff --git a/Documentation/ABI/testing/sysfs-fs-ext4 b/Documentation/ABI/testing/sysfs-fs-ext4
index 4e79074de28..5fb709997d9 100644
--- a/Documentation/ABI/testing/sysfs-fs-ext4
+++ b/Documentation/ABI/testing/sysfs-fs-ext4
@@ -79,3 +79,13 @@ Description:
This file is read-only and shows the number of
kilobytes of data that have been written to this
filesystem since it was mounted.
+
+What: /sys/fs/ext4/<disk>/inode_goal
+Date: June 2008
+Contact: "Theodore Ts'o" <tytso@mit.edu>
+Description:
+ Tuning parameter which (if non-zero) controls the goal
+ inode used by the inode allocator in p0reference to
+ all other allocation hueristics. This is intended for
+ debugging use only, and should be 0 on production
+ systems.
diff --git a/Documentation/ABI/testing/sysfs-pps b/Documentation/ABI/testing/sysfs-pps
new file mode 100644
index 00000000000..25028c7bc37
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-pps
@@ -0,0 +1,73 @@
+What: /sys/class/pps/
+Date: February 2008
+Contact: Rodolfo Giometti <giometti@linux.it>
+Description:
+ The /sys/class/pps/ directory will contain files and
+ directories that will provide a unified interface to
+ the PPS sources.
+
+What: /sys/class/pps/ppsX/
+Date: February 2008
+Contact: Rodolfo Giometti <giometti@linux.it>
+Description:
+ The /sys/class/pps/ppsX/ directory is related to X-th
+ PPS source into the system. Each directory will
+ contain files to manage and control its PPS source.
+
+What: /sys/class/pps/ppsX/assert
+Date: February 2008
+Contact: Rodolfo Giometti <giometti@linux.it>
+Description:
+ The /sys/class/pps/ppsX/assert file reports the assert events
+ and the assert sequence number of the X-th source in the form:
+
+ <secs>.<nsec>#<sequence>
+
+ If the source has no assert events the content of this file
+ is empty.
+
+What: /sys/class/pps/ppsX/clear
+Date: February 2008
+Contact: Rodolfo Giometti <giometti@linux.it>
+Description:
+ The /sys/class/pps/ppsX/clear file reports the clear events
+ and the clear sequence number of the X-th source in the form:
+
+ <secs>.<nsec>#<sequence>
+
+ If the source has no clear events the content of this file
+ is empty.
+
+What: /sys/class/pps/ppsX/mode
+Date: February 2008
+Contact: Rodolfo Giometti <giometti@linux.it>
+Description:
+ The /sys/class/pps/ppsX/mode file reports the functioning
+ mode of the X-th source in hexadecimal encoding.
+
+ Please, refer to linux/include/linux/pps.h for further
+ info.
+
+What: /sys/class/pps/ppsX/echo
+Date: February 2008
+Contact: Rodolfo Giometti <giometti@linux.it>
+Description:
+ The /sys/class/pps/ppsX/echo file reports if the X-th does
+ or does not support an "echo" function.
+
+What: /sys/class/pps/ppsX/name
+Date: February 2008
+Contact: Rodolfo Giometti <giometti@linux.it>
+Description:
+ The /sys/class/pps/ppsX/name file reports the name of the
+ X-th source.
+
+What: /sys/class/pps/ppsX/path
+Date: February 2008
+Contact: Rodolfo Giometti <giometti@linux.it>
+Description:
+ The /sys/class/pps/ppsX/path file reports the path name of
+ the device connected with the X-th source.
+
+ If the source is not connected with any device the content
+ of this file is empty.
diff --git a/Documentation/Changes b/Documentation/Changes
index b95082be4d5..6d0f1efc5bf 100644
--- a/Documentation/Changes
+++ b/Documentation/Changes
@@ -29,7 +29,7 @@ hardware, for example, you probably needn't concern yourself with
isdn4k-utils.
o Gnu C 3.2 # gcc --version
-o Gnu make 3.79.1 # make --version
+o Gnu make 3.80 # make --version
o binutils 2.12 # ld -v
o util-linux 2.10o # fdformat --version
o module-init-tools 0.9.10 # depmod -V
@@ -48,6 +48,7 @@ o procps 3.2.0 # ps --version
o oprofile 0.9 # oprofiled --version
o udev 081 # udevinfo -V
o grub 0.93 # grub --version
+o mcelog 0.6
Kernel compilation
==================
@@ -61,7 +62,7 @@ computer.
Make
----
-You will need Gnu make 3.79.1 or later to build the kernel.
+You will need Gnu make 3.80 or later to build the kernel.
Binutils
--------
@@ -71,6 +72,13 @@ assembling the 16-bit boot code, removing the need for as86 to compile
your kernel. This change does, however, mean that you need a recent
release of binutils.
+Perl
+----
+
+You will need perl 5 and the following modules: Getopt::Long, Getopt::Std,
+File::Basename, and File::Find to build the kernel.
+
+
System utilities
================
@@ -276,6 +284,16 @@ before running exportfs or mountd. It is recommended that all NFS
services be protected from the internet-at-large by a firewall where
that is possible.
+mcelog
+------
+
+In Linux 2.6.31+ the i386 kernel needs to run the mcelog utility
+as a regular cronjob similar to the x86-64 kernel to process and log
+machine check events when CONFIG_X86_NEW_MCE is enabled. Machine check
+events are errors reported by the CPU. Processing them is strongly encouraged.
+All x86-64 kernels since 2.6.4 require the mcelog utility to
+process machine checks.
+
Getting updated software
========================
@@ -365,6 +383,10 @@ FUSE
----
o <http://sourceforge.net/projects/fuse>
+mcelog
+------
+o <ftp://ftp.kernel.org/pub/linux/utils/cpu/mce/mcelog/>
+
Networking
**********
diff --git a/Documentation/CodingStyle b/Documentation/CodingStyle
index 72968cd5eaf..8bb37237ebd 100644
--- a/Documentation/CodingStyle
+++ b/Documentation/CodingStyle
@@ -698,8 +698,8 @@ very often is not. Abundant use of the inline keyword leads to a much bigger
kernel, which in turn slows the system as a whole down, due to a bigger
icache footprint for the CPU and simply because there is less memory
available for the pagecache. Just think about it; a pagecache miss causes a
-disk seek, which easily takes 5 miliseconds. There are a LOT of cpu cycles
-that can go into these 5 miliseconds.
+disk seek, which easily takes 5 milliseconds. There are a LOT of cpu cycles
+that can go into these 5 milliseconds.
A reasonable rule of thumb is to not put inline at functions that have more
than 3 lines of code in them. An exception to this rule are the cases where
diff --git a/Documentation/DMA-API.txt b/Documentation/DMA-API.txt
index d9aa43d78bc..5aceb88b3f8 100644
--- a/Documentation/DMA-API.txt
+++ b/Documentation/DMA-API.txt
@@ -676,8 +676,8 @@ this directory the following files can currently be found:
dma-api/all_errors This file contains a numeric value. If this
value is not equal to zero the debugging code
will print a warning for every error it finds
- into the kernel log. Be carefull with this
- option. It can easily flood your logs.
+ into the kernel log. Be careful with this
+ option, as it can easily flood your logs.
dma-api/disabled This read-only file contains the character 'Y'
if the debugging code is disabled. This can
@@ -704,12 +704,24 @@ this directory the following files can currently be found:
The current number of free dma_debug_entries
in the allocator.
+ dma-api/driver-filter
+ You can write a name of a driver into this file
+ to limit the debug output to requests from that
+ particular driver. Write an empty string to
+ that file to disable the filter and see
+ all errors again.
+
If you have this code compiled into your kernel it will be enabled by default.
If you want to boot without the bookkeeping anyway you can provide
'dma_debug=off' as a boot parameter. This will disable DMA-API debugging.
Notice that you can not enable it again at runtime. You have to reboot to do
so.
+If you want to see debug messages only for a special device driver you can
+specify the dma_debug_driver=<drivername> parameter. This will enable the
+driver filter at boot time. The debug code will only print errors for that
+driver afterwards. This filter can be disabled or changed later using debugfs.
+
When the code disables itself at runtime this is most likely because it ran
out of dma_debug_entries. These entries are preallocated at boot. The number
of preallocated entries is defined per architecture. If it is too low for you
diff --git a/Documentation/DocBook/Makefile b/Documentation/DocBook/Makefile
index b1eb661e630..9632444f6c6 100644
--- a/Documentation/DocBook/Makefile
+++ b/Documentation/DocBook/Makefile
@@ -13,7 +13,8 @@ DOCBOOKS := z8530book.xml mcabook.xml device-drivers.xml \
gadget.xml libata.xml mtdnand.xml librs.xml rapidio.xml \
genericirq.xml s390-drivers.xml uio-howto.xml scsi.xml \
mac80211.xml debugobjects.xml sh.xml regulator.xml \
- alsa-driver-api.xml writing-an-alsa-driver.xml
+ alsa-driver-api.xml writing-an-alsa-driver.xml \
+ tracepoint.xml
###
# The build process is as follows (targets):
diff --git a/Documentation/DocBook/debugobjects.tmpl b/Documentation/DocBook/debugobjects.tmpl
index 7f5f218015f..08ff908aa7a 100644
--- a/Documentation/DocBook/debugobjects.tmpl
+++ b/Documentation/DocBook/debugobjects.tmpl
@@ -106,7 +106,7 @@
number of errors are printk'ed including a full stack trace.
</para>
<para>
- The statistics are available via debugfs/debug_objects/stats.
+ The statistics are available via /sys/kernel/debug/debug_objects/stats.
They provide information about the number of warnings and the
number of successful fixups along with information about the
usage of the internal tracking objects and the state of the
diff --git a/Documentation/DocBook/kernel-hacking.tmpl b/Documentation/DocBook/kernel-hacking.tmpl
index a50d6cd5857..992e67e6be7 100644
--- a/Documentation/DocBook/kernel-hacking.tmpl
+++ b/Documentation/DocBook/kernel-hacking.tmpl
@@ -449,8 +449,8 @@ printk(KERN_INFO "i = %u\n", i);
</para>
<programlisting>
-__u32 ipaddress;
-printk(KERN_INFO "my ip: %d.%d.%d.%d\n", NIPQUAD(ipaddress));
+__be32 ipaddress;
+printk(KERN_INFO "my ip: %pI4\n", &amp;ipaddress);
</programlisting>
<para>
diff --git a/Documentation/DocBook/mac80211.tmpl b/Documentation/DocBook/mac80211.tmpl
index fbeaffc1dcc..f3f37f141db 100644
--- a/Documentation/DocBook/mac80211.tmpl
+++ b/Documentation/DocBook/mac80211.tmpl
@@ -145,7 +145,6 @@ usage should require reading the full document.
interface in STA mode at first!
</para>
!Finclude/net/mac80211.h ieee80211_if_init_conf
-!Finclude/net/mac80211.h ieee80211_if_conf
</chapter>
<chapter id="rx-tx">
@@ -185,8 +184,6 @@ usage should require reading the full document.
!Finclude/net/mac80211.h ieee80211_ctstoself_get
!Finclude/net/mac80211.h ieee80211_ctstoself_duration
!Finclude/net/mac80211.h ieee80211_generic_frame_duration
-!Finclude/net/mac80211.h ieee80211_get_hdrlen_from_skb
-!Finclude/net/mac80211.h ieee80211_hdrlen
!Finclude/net/mac80211.h ieee80211_wake_queue
!Finclude/net/mac80211.h ieee80211_stop_queue
!Finclude/net/mac80211.h ieee80211_wake_queues
diff --git a/Documentation/DocBook/tracepoint.tmpl b/Documentation/DocBook/tracepoint.tmpl
new file mode 100644
index 00000000000..b0756d0fd57
--- /dev/null
+++ b/Documentation/DocBook/tracepoint.tmpl
@@ -0,0 +1,89 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN"
+ "http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd" []>
+
+<book id="Tracepoints">
+ <bookinfo>
+ <title>The Linux Kernel Tracepoint API</title>
+
+ <authorgroup>
+ <author>
+ <firstname>Jason</firstname>
+ <surname>Baron</surname>
+ <affiliation>
+ <address>
+ <email>jbaron@redhat.com</email>
+ </address>
+ </affiliation>
+ </author>
+ </authorgroup>
+
+ <legalnotice>
+ <para>
+ This documentation is free software; you can redistribute
+ it and/or modify it under the terms of the GNU General Public
+ License as published by the Free Software Foundation; either
+ version 2 of the License, or (at your option) any later
+ version.
+ </para>
+
+ <para>
+ This program is distributed in the hope that it will be
+ useful, but WITHOUT ANY WARRANTY; without even the implied
+ warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+ See the GNU General Public License for more details.
+ </para>
+
+ <para>
+ You should have received a copy of the GNU General Public
+ License along with this program; if not, write to the Free
+ Software Foundation, Inc., 59 Temple Place, Suite 330, Boston,
+ MA 02111-1307 USA
+ </para>
+
+ <para>
+ For more details see the file COPYING in the source
+ distribution of Linux.
+ </para>
+ </legalnotice>
+ </bookinfo>
+
+ <toc></toc>
+ <chapter id="intro">
+ <title>Introduction</title>
+ <para>
+ Tracepoints are static probe points that are located in strategic points
+ throughout the kernel. 'Probes' register/unregister with tracepoints
+ via a callback mechanism. The 'probes' are strictly typed functions that
+ are passed a unique set of parameters defined by each tracepoint.
+ </para>
+
+ <para>
+ From this simple callback mechanism, 'probes' can be used to profile, debug,
+ and understand kernel behavior. There are a number of tools that provide a
+ framework for using 'probes'. These tools include Systemtap, ftrace, and
+ LTTng.
+ </para>
+
+ <para>
+ Tracepoints are defined in a number of header files via various macros. Thus,
+ the purpose of this document is to provide a clear accounting of the available
+ tracepoints. The intention is to understand not only what tracepoints are
+ available but also to understand where future tracepoints might be added.
+ </para>
+
+ <para>
+ The API presented has functions of the form:
+ <function>trace_tracepointname(function parameters)</function>. These are the
+ tracepoints callbacks that are found throughout the code. Registering and
+ unregistering probes with these callback sites is covered in the
+ <filename>Documentation/trace/*</filename> directory.
+ </para>
+ </chapter>
+
+ <chapter id="irq">
+ <title>IRQ</title>
+!Iinclude/trace/events/irq.h
+ </chapter>
+
+</book>
diff --git a/Documentation/PCI/pcieaer-howto.txt b/Documentation/PCI/pcieaer-howto.txt
index ddeb14beacc..be21001ab14 100644
--- a/Documentation/PCI/pcieaer-howto.txt
+++ b/Documentation/PCI/pcieaer-howto.txt
@@ -61,6 +61,10 @@ be initiated although firmwares have no _OSC support. To enable the
walkaround, pls. add aerdriver.forceload=y to kernel boot parameter line
when booting kernel. Note that forceload=n by default.
+nosourceid, another parameter of type bool, can be used when broken
+hardware (mostly chipsets) has root ports that cannot obtain the reporting
+source ID. nosourceid=n by default.
+
2.3 AER error output
When a PCI-E AER error is captured, an error message will be outputed to
console. If it's a correctable error, it is outputed as a warning.
@@ -246,3 +250,24 @@ with the PCI Express AER Root driver?
A: It could call the helper functions to enable AER in devices and
cleanup uncorrectable status register. Pls. refer to section 3.3.
+
+4. Software error injection
+
+Debugging PCIE AER error recovery code is quite difficult because it
+is hard to trigger real hardware errors. Software based error
+injection can be used to fake various kinds of PCIE errors.
+
+First you should enable PCIE AER software error injection in kernel
+configuration, that is, following item should be in your .config.
+
+CONFIG_PCIEAER_INJECT=y or CONFIG_PCIEAER_INJECT=m
+
+After reboot with new kernel or insert the module, a device file named
+/dev/aer_inject should be created.
+
+Then, you need a user space tool named aer-inject, which can be gotten
+from:
+ http://www.kernel.org/pub/linux/utils/pci/aer-inject/
+
+More information about aer-inject can be found in the document comes
+with its source code.
diff --git a/Documentation/RCU/rculist_nulls.txt b/Documentation/RCU/rculist_nulls.txt
index 6389dec3345..18f9651ff23 100644
--- a/Documentation/RCU/rculist_nulls.txt
+++ b/Documentation/RCU/rculist_nulls.txt
@@ -83,11 +83,12 @@ not detect it missed following items in original chain.
obj = kmem_cache_alloc(...);
lock_chain(); // typically a spin_lock()
obj->key = key;
-atomic_inc(&obj->refcnt);
/*
* we need to make sure obj->key is updated before obj->next
+ * or obj->refcnt
*/
smp_wmb();
+atomic_set(&obj->refcnt, 1);
hlist_add_head_rcu(&obj->obj_node, list);
unlock_chain(); // typically a spin_unlock()
@@ -118,7 +119,7 @@ to another chain) checking the final 'nulls' value if
the lookup met the end of chain. If final 'nulls' value
is not the slot number, then we must restart the lookup at
the beginning. If the object was moved to the same chain,
-then the reader doesnt care : It might eventually
+then the reader doesn't care : It might eventually
scan the list again without harm.
@@ -159,6 +160,10 @@ out:
obj = kmem_cache_alloc(cachep);
lock_chain(); // typically a spin_lock()
obj->key = key;
+/*
+ * changes to obj->key must be visible before refcnt one
+ */
+smp_wmb();
atomic_set(&obj->refcnt, 1);
/*
* insert obj in RCU way (readers might be traversing chain)
diff --git a/Documentation/RCU/trace.txt b/Documentation/RCU/trace.txt
index 068848240a8..02cced183b2 100644
--- a/Documentation/RCU/trace.txt
+++ b/Documentation/RCU/trace.txt
@@ -192,23 +192,24 @@ rcu/rcuhier (which displays the struct rcu_node hierarchy).
The output of "cat rcu/rcudata" looks as follows:
rcu:
- 0 c=4011 g=4012 pq=1 pqc=4011 qp=0 rpfq=1 rp=3c2a dt=23301/73 dn=2 df=1882 of=0 ri=2126 ql=2 b=10
- 1 c=4011 g=4012 pq=1 pqc=4011 qp=0 rpfq=3 rp=39a6 dt=78073/1 dn=2 df=1402 of=0 ri=1875 ql=46 b=10
- 2 c=4010 g=4010 pq=1 pqc=4010 qp=0 rpfq=-5 rp=1d12 dt=16646/0 dn=2 df=3140 of=0 ri=2080 ql=0 b=10
- 3 c=4012 g=4013 pq=1 pqc=4012 qp=1 rpfq=3 rp=2b50 dt=21159/1 dn=2 df=2230 of=0 ri=1923 ql=72 b=10
- 4 c=4012 g=4013 pq=1 pqc=4012 qp=1 rpfq=3 rp=1644 dt=5783/1 dn=2 df=3348 of=0 ri=2805 ql=7 b=10
- 5 c=4012 g=4013 pq=0 pqc=4011 qp=1 rpfq=3 rp=1aac dt=5879/1 dn=2 df=3140 of=0 ri=2066 ql=10 b=10
- 6 c=4012 g=4013 pq=1 pqc=4012 qp=1 rpfq=3 rp=ed8 dt=5847/1 dn=2 df=3797 of=0 ri=1266 ql=10 b=10
- 7 c=4012 g=4013 pq=1 pqc=4012 qp=1 rpfq=3 rp=1fa2 dt=6199/1 dn=2 df=2795 of=0 ri=2162 ql=28 b=10
+rcu:
+ 0 c=17829 g=17829 pq=1 pqc=17829 qp=0 dt=10951/1 dn=0 df=1101 of=0 ri=36 ql=0 b=10
+ 1 c=17829 g=17829 pq=1 pqc=17829 qp=0 dt=16117/1 dn=0 df=1015 of=0 ri=0 ql=0 b=10
+ 2 c=17829 g=17829 pq=1 pqc=17829 qp=0 dt=1445/1 dn=0 df=1839 of=0 ri=0 ql=0 b=10
+ 3 c=17829 g=17829 pq=1 pqc=17829 qp=0 dt=6681/1 dn=0 df=1545 of=0 ri=0 ql=0 b=10
+ 4 c=17829 g=17829 pq=1 pqc=17829 qp=0 dt=1003/1 dn=0 df=1992 of=0 ri=0 ql=0 b=10
+ 5 c=17829 g=17830 pq=1 pqc=17829 qp=1 dt=3887/1 dn=0 df=3331 of=0 ri=4 ql=2 b=10
+ 6 c=17829 g=17829 pq=1 pqc=17829 qp=0 dt=859/1 dn=0 df=3224 of=0 ri=0 ql=0 b=10
+ 7 c=17829 g=17830 pq=0 pqc=17829 qp=1 dt=3761/1 dn=0 df=1818 of=0 ri=0 ql=2 b=10
rcu_bh:
- 0 c=-268 g=-268 pq=1 pqc=-268 qp=0 rpfq=-145 rp=21d6 dt=23301/73 dn=2 df=0 of=0 ri=0 ql=0 b=10
- 1 c=-268 g=-268 pq=1 pqc=-268 qp=1 rpfq=-170 rp=20ce dt=78073/1 dn=2 df=26 of=0 ri=5 ql=0 b=10
- 2 c=-268 g=-268 pq=1 pqc=-268 qp=1 rpfq=-83 rp=fbd dt=16646/0 dn=2 df=28 of=0 ri=4 ql=0 b=10
- 3 c=-268 g=-268 pq=1 pqc=-268 qp=0 rpfq=-105 rp=178c dt=21159/1 dn=2 df=28 of=0 ri=2 ql=0 b=10
- 4 c=-268 g=-268 pq=1 pqc=-268 qp=1 rpfq=-30 rp=b54 dt=5783/1 dn=2 df=32 of=0 ri=0 ql=0 b=10
- 5 c=-268 g=-268 pq=1 pqc=-268 qp=1 rpfq=-29 rp=df5 dt=5879/1 dn=2 df=30 of=0 ri=3 ql=0 b=10
- 6 c=-268 g=-268 pq=1 pqc=-268 qp=1 rpfq=-28 rp=788 dt=5847/1 dn=2 df=32 of=0 ri=0 ql=0 b=10
- 7 c=-268 g=-268 pq=1 pqc=-268 qp=1 rpfq=-53 rp=1098 dt=6199/1 dn=2 df=30 of=0 ri=3 ql=0 b=10
+ 0 c=-275 g=-275 pq=1 pqc=-275 qp=0 dt=10951/1 dn=0 df=0 of=0 ri=0 ql=0 b=10
+ 1 c=-275 g=-275 pq=1 pqc=-275 qp=0 dt=16117/1 dn=0 df=13 of=0 ri=0 ql=0 b=10
+ 2 c=-275 g=-275 pq=1 pqc=-275 qp=0 dt=1445/1 dn=0 df=15 of=0 ri=0 ql=0 b=10
+ 3 c=-275 g=-275 pq=1 pqc=-275 qp=0 dt=6681/1 dn=0 df=9 of=0 ri=0 ql=0 b=10
+ 4 c=-275 g=-275 pq=1 pqc=-275 qp=0 dt=1003/1 dn=0 df=15 of=0 ri=0 ql=0 b=10
+ 5 c=-275 g=-275 pq=1 pqc=-275 qp=0 dt=3887/1 dn=0 df=15 of=0 ri=0 ql=0 b=10
+ 6 c=-275 g=-275 pq=1 pqc=-275 qp=0 dt=859/1 dn=0 df=15 of=0 ri=0 ql=0 b=10
+ 7 c=-275 g=-275 pq=1 pqc=-275 qp=0 dt=3761/1 dn=0 df=15 of=0 ri=0 ql=0 b=10
The first section lists the rcu_data structures for rcu, the second for
rcu_bh. Each section has one line per CPU, or eight for this 8-CPU system.
@@ -253,12 +254,6 @@ o "pqc" indicates which grace period the last-observed quiescent
o "qp" indicates that RCU still expects a quiescent state from
this CPU.
-o "rpfq" is the number of rcu_pending() calls on this CPU required
- to induce this CPU to invoke force_quiescent_state().
-
-o "rp" is low-order four hex digits of the count of how many times
- rcu_pending() has been invoked on this CPU.
-
o "dt" is the current value of the dyntick counter that is incremented
when entering or leaving dynticks idle state, either by the
scheduler or by irq. The number after the "/" is the interrupt
@@ -305,6 +300,9 @@ o "b" is the batch limit for this CPU. If more than this number
of RCU callbacks is ready to invoke, then the remainder will
be deferred.
+There is also an rcu/rcudata.csv file with the same information in
+comma-separated-variable spreadsheet format.
+
The output of "cat rcu/rcugp" looks as follows:
@@ -411,3 +409,63 @@ o Each element of the form "1/1 0:127 ^0" represents one struct
For example, the first entry at the lowest level shows
"^0", indicating that it corresponds to bit zero in
the first entry at the middle level.
+
+
+The output of "cat rcu/rcu_pending" looks as follows:
+
+rcu:
+ 0 np=255892 qsp=53936 cbr=0 cng=14417 gpc=10033 gps=24320 nf=6445 nn=146741
+ 1 np=261224 qsp=54638 cbr=0 cng=25723 gpc=16310 gps=2849 nf=5912 nn=155792
+ 2 np=237496 qsp=49664 cbr=0 cng=2762 gpc=45478 gps=1762 nf=1201 nn=136629
+ 3 np=236249 qsp=48766 cbr=0 cng=286 gpc=48049 gps=1218 nf=207 nn=137723
+ 4 np=221310 qsp=46850 cbr=0 cng=26 gpc=43161 gps=4634 nf=3529 nn=123110
+ 5 np=237332 qsp=48449 cbr=0 cng=54 gpc=47920 gps=3252 nf=201 nn=137456
+ 6 np=219995 qsp=46718 cbr=0 cng=50 gpc=42098 gps=6093 nf=4202 nn=120834
+ 7 np=249893 qsp=49390 cbr=0 cng=72 gpc=38400 gps=17102 nf=41 nn=144888
+rcu_bh:
+ 0 np=146741 qsp=1419 cbr=0 cng=6 gpc=0 gps=0 nf=2 nn=145314
+ 1 np=155792 qsp=12597 cbr=0 cng=0 gpc=4 gps=8 nf=3 nn=143180
+ 2 np=136629 qsp=18680 cbr=0 cng=0 gpc=7 gps=6 nf=0 nn=117936
+ 3 np=137723 qsp=2843 cbr=0 cng=0 gpc=10 gps=7 nf=0 nn=134863
+ 4 np=123110 qsp=12433 cbr=0 cng=0 gpc=4 gps=2 nf=0 nn=110671
+ 5 np=137456 qsp=4210 cbr=0 cng=0 gpc=6 gps=5 nf=0 nn=133235
+ 6 np=120834 qsp=9902 cbr=0 cng=0 gpc=6 gps=3 nf=2 nn=110921
+ 7 np=144888 qsp=26336 cbr=0 cng=0 gpc=8 gps=2 nf=0 nn=118542
+
+As always, this is once again split into "rcu" and "rcu_bh" portions.
+The fields are as follows:
+
+o "np" is the number of times that __rcu_pending() has been invoked
+ for the corresponding flavor of RCU.
+
+o "qsp" is the number of times that the RCU was waiting for a
+ quiescent state from this CPU.
+
+o "cbr" is the number of times that this CPU had RCU callbacks
+ that had passed through a grace period, and were thus ready
+ to be invoked.
+
+o "cng" is the number of times that this CPU needed another
+ grace period while RCU was idle.
+
+o "gpc" is the number of times that an old grace period had
+ completed, but this CPU was not yet aware of it.
+
+o "gps" is the number of times that a new grace period had started,
+ but this CPU was not yet aware of it.
+
+o "nf" is the number of times that this CPU suspected that the
+ current grace period had run for too long, and thus needed to
+ be forced.
+
+ Please note that "forcing" consists of sending resched IPIs
+ to holdout CPUs. If that CPU really still is in an old RCU
+ read-side critical section, then we really do have to wait for it.
+ The assumption behing "forcing" is that the CPU is not still in
+ an old RCU read-side critical section, but has not yet responded
+ for some other reason.
+
+o "nn" is the number of times that this CPU needed nothing. Alert
+ readers will note that the rcu "nn" number for a given CPU very
+ closely matches the rcu_bh "np" number for that same CPU. This
+ is due to short-circuit evaluation in rcu_pending().
diff --git a/Documentation/SM501.txt b/Documentation/SM501.txt
index 6fc65603592..561826f8209 100644
--- a/Documentation/SM501.txt
+++ b/Documentation/SM501.txt
@@ -5,7 +5,7 @@ Copyright 2006, 2007 Simtec Electronics
The Silicon Motion SM501 multimedia companion chip is a multifunction device
which may provide numerous interfaces including USB host controller USB gadget,
-Asyncronous Serial ports, Audio functions and a dual display video interface.
+asynchronous serial ports, audio functions, and a dual display video interface.
The device may be connected by PCI or local bus with varying functions enabled.
Core
diff --git a/Documentation/Smack.txt b/Documentation/Smack.txt
index 629c92e9978..34614b4c708 100644
--- a/Documentation/Smack.txt
+++ b/Documentation/Smack.txt
@@ -184,8 +184,9 @@ length. Single character labels using special characters, that being anything
other than a letter or digit, are reserved for use by the Smack development
team. Smack labels are unstructured, case sensitive, and the only operation
ever performed on them is comparison for equality. Smack labels cannot
-contain unprintable characters or the "/" (slash) character. Smack labels
-cannot begin with a '-', which is reserved for special options.
+contain unprintable characters, the "/" (slash), the "\" (backslash), the "'"
+(quote) and '"' (double-quote) characters.
+Smack labels cannot begin with a '-', which is reserved for special options.
There are some predefined labels:
@@ -523,3 +524,18 @@ Smack supports some mount options:
These mount options apply to all file system types.
+Smack auditing
+
+If you want Smack auditing of security events, you need to set CONFIG_AUDIT
+in your kernel configuration.
+By default, all denied events will be audited. You can change this behavior by
+writing a single character to the /smack/logging file :
+0 : no logging
+1 : log denied (default)
+2 : log accepted
+3 : log denied & accepted
+
+Events are logged as 'key=value' pairs, for each event you at least will get
+the subjet, the object, the rights requested, the action, the kernel function
+that triggered the event, plus other pairs depending on the type of event
+audited.
diff --git a/Documentation/SubmitChecklist b/Documentation/SubmitChecklist
index ac5e0b2f109..78a9168ff37 100644
--- a/Documentation/SubmitChecklist
+++ b/Documentation/SubmitChecklist
@@ -54,7 +54,7 @@ kernel patches.
CONFIG_PREEMPT.
14: If the patch affects IO/Disk, etc: has been tested with and without
- CONFIG_LBD.
+ CONFIG_LBDAF.
15: All codepaths have been exercised with all lockdep features enabled.
diff --git a/Documentation/SubmittingPatches b/Documentation/SubmittingPatches
index f309d3c6221..5c555a8b39e 100644
--- a/Documentation/SubmittingPatches
+++ b/Documentation/SubmittingPatches
@@ -91,6 +91,10 @@ Be as specific as possible. The WORST descriptions possible include
things like "update driver X", "bug fix for driver X", or "this patch
includes updates for subsystem X. Please apply."
+The maintainer will thank you if you write your patch description in a
+form which can be easily pulled into Linux's source code management
+system, git, as a "commit log". See #15, below.
+
If your description starts to get long, that's a sign that you probably
need to split up your patch. See #3, next.
@@ -183,8 +187,9 @@ Even if the maintainer did not respond in step #4, make sure to ALWAYS
copy the maintainer when you change their code.
For small patches you may want to CC the Trivial Patch Monkey
-trivial@kernel.org managed by Jesper Juhl; which collects "trivial"
-patches. Trivial patches must qualify for one of the following rules:
+trivial@kernel.org which collects "trivial" patches. Have a look
+into the MAINTAINERS file for its current manager.
+Trivial patches must qualify for one of the following rules:
Spelling fixes in documentation
Spelling fixes which could break grep(1)
Warning fixes (cluttering with useless warnings is bad)
@@ -196,7 +201,6 @@ patches. Trivial patches must qualify for one of the following rules:
since people copy, as long as it's trivial)
Any fix by the author/maintainer of the file (ie. patch monkey
in re-transmission mode)
-URL: <http://www.kernel.org/pub/linux/kernel/people/juhl/trivial/>
@@ -405,7 +409,14 @@ person it names. This tag documents that potentially interested parties
have been included in the discussion
-14) Using Tested-by: and Reviewed-by:
+14) Using Reported-by:, Tested-by: and Reviewed-by:
+
+If this patch fixes a problem reported by somebody else, consider adding a
+Reported-by: tag to credit the reporter for their contribution. Please
+note that this tag should not be added without the reporter's permission,
+especially if the problem was not reported in a public forum. That said,
+if we diligently credit our bug reporters, they will, hopefully, be
+inspired to help us again in the future.
A Tested-by: tag indicates that the patch has been successfully tested (in
some environment) by the person named. This tag informs maintainers that
@@ -444,7 +455,7 @@ offer a Reviewed-by tag for a patch. This tag serves to give credit to
reviewers and to inform maintainers of the degree of review which has been
done on the patch. Reviewed-by: tags, when supplied by reviewers known to
understand the subject area and to perform thorough reviews, will normally
-increase the liklihood of your patch getting into the kernel.
+increase the likelihood of your patch getting into the kernel.
15) The canonical patch format
@@ -485,12 +496,33 @@ phrase" should not be a filename. Do not use the same "summary
phrase" for every patch in a whole patch series (where a "patch
series" is an ordered sequence of multiple, related patches).
-Bear in mind that the "summary phrase" of your email becomes
-a globally-unique identifier for that patch. It propagates
-all the way into the git changelog. The "summary phrase" may
-later be used in developer discussions which refer to the patch.
-People will want to google for the "summary phrase" to read
-discussion regarding that patch.
+Bear in mind that the "summary phrase" of your email becomes a
+globally-unique identifier for that patch. It propagates all the way
+into the git changelog. The "summary phrase" may later be used in
+developer discussions which refer to the patch. People will want to
+google for the "summary phrase" to read discussion regarding that
+patch. It will also be the only thing that people may quickly see
+when, two or three months later, they are going through perhaps
+thousands of patches using tools such as "gitk" or "git log
+--oneline".
+
+For these reasons, the "summary" must be no more than 70-75
+characters, and it must describe both what the patch changes, as well
+as why the patch might be necessary. It is challenging to be both
+succinct and descriptive, but that is what a well-written summary
+should do.
+
+The "summary phrase" may be prefixed by tags enclosed in square
+brackets: "Subject: [PATCH tag] <summary phrase>". The tags are not
+considered part of the summary phrase, but describe how the patch
+should be treated. Common tags might include a version descriptor if
+the multiple versions of the patch have been sent out in response to
+comments (i.e., "v1, v2, v3"), or "RFC" to indicate a request for
+comments. If there are four patches in a patch series the individual
+patches may be numbered like this: 1/4, 2/4, 3/4, 4/4. This assures
+that developers understand the order in which the patches should be
+applied and that they have reviewed or applied all of the patches in
+the patch series.
A couple of example Subjects:
@@ -510,19 +542,31 @@ the patch author in the changelog.
The explanation body will be committed to the permanent source
changelog, so should make sense to a competent reader who has long
since forgotten the immediate details of the discussion that might
-have led to this patch.
+have led to this patch. Including symptoms of the failure which the
+patch addresses (kernel log messages, oops messages, etc.) is
+especially useful for people who might be searching the commit logs
+looking for the applicable patch. If a patch fixes a compile failure,
+it may not be necessary to include _all_ of the compile failures; just
+enough that it is likely that someone searching for the patch can find
+it. As in the "summary phrase", it is important to be both succinct as
+well as descriptive.
The "---" marker line serves the essential purpose of marking for patch
handling tools where the changelog message ends.
One good use for the additional comments after the "---" marker is for
-a diffstat, to show what files have changed, and the number of inserted
-and deleted lines per file. A diffstat is especially useful on bigger
-patches. Other comments relevant only to the moment or the maintainer,
-not suitable for the permanent changelog, should also go here.
-Use diffstat options "-p 1 -w 70" so that filenames are listed from the
-top of the kernel source tree and don't use too much horizontal space
-(easily fit in 80 columns, maybe with some indentation).
+a diffstat, to show what files have changed, and the number of
+inserted and deleted lines per file. A diffstat is especially useful
+on bigger patches. Other comments relevant only to the moment or the
+maintainer, not suitable for the permanent changelog, should also go
+here. A good example of such comments might be "patch changelogs"
+which describe what has changed between the v1 and v2 version of the
+patch.
+
+If you are going to include a diffstat after the "---" marker, please
+use diffstat options "-p 1 -w 70" so that filenames are listed from
+the top of the kernel source tree and don't use too much horizontal
+space (easily fit in 80 columns, maybe with some indentation).
See more details on the proper patch format in the following
references.
diff --git a/Documentation/accounting/getdelays.c b/Documentation/accounting/getdelays.c
index 7ea231172c8..aa73e72fd79 100644
--- a/Documentation/accounting/getdelays.c
+++ b/Documentation/accounting/getdelays.c
@@ -246,7 +246,8 @@ void print_ioacct(struct taskstats *t)
int main(int argc, char *argv[])
{
- int c, rc, rep_len, aggr_len, len2, cmd_type;
+ int c, rc, rep_len, aggr_len, len2;
+ int cmd_type = TASKSTATS_CMD_ATTR_UNSPEC;
__u16 id;
__u32 mypid;
diff --git a/Documentation/arm/Samsung-S3C24XX/GPIO.txt b/Documentation/arm/Samsung-S3C24XX/GPIO.txt
index ea7ccfc4b27..948c8718d96 100644
--- a/Documentation/arm/Samsung-S3C24XX/GPIO.txt
+++ b/Documentation/arm/Samsung-S3C24XX/GPIO.txt
@@ -51,7 +51,7 @@ PIN Numbers
-----------
Each pin has an unique number associated with it in regs-gpio.h,
- eg S3C2410_GPA0 or S3C2410_GPF1. These defines are used to tell
+ eg S3C2410_GPA(0) or S3C2410_GPF(1). These defines are used to tell
the GPIO functions which pin is to be used.
@@ -65,11 +65,11 @@ Configuring a pin
Eg:
- s3c2410_gpio_cfgpin(S3C2410_GPA0, S3C2410_GPA0_ADDR0);
- s3c2410_gpio_cfgpin(S3C2410_GPE8, S3C2410_GPE8_SDDAT1);
+ s3c2410_gpio_cfgpin(S3C2410_GPA(0), S3C2410_GPA0_ADDR0);
+ s3c2410_gpio_cfgpin(S3C2410_GPE(8), S3C2410_GPE8_SDDAT1);
- which would turn GPA0 into the lowest Address line A0, and set
- GPE8 to be connected to the SDIO/MMC controller's SDDAT1 line.
+ which would turn GPA(0) into the lowest Address line A0, and set
+ GPE(8) to be connected to the SDIO/MMC controller's SDDAT1 line.
Reading the current configuration
diff --git a/Documentation/arm/memory.txt b/Documentation/arm/memory.txt
index 43cb1004d35..9d58c7c5edd 100644
--- a/Documentation/arm/memory.txt
+++ b/Documentation/arm/memory.txt
@@ -21,6 +21,8 @@ ffff8000 ffffffff copy_user_page / clear_user_page use.
For SA11xx and Xscale, this is used to
setup a minicache mapping.
+ffff4000 ffffffff cache aliasing on ARMv6 and later CPUs.
+
ffff1000 ffff7fff Reserved.
Platforms must not use this address range.
diff --git a/Documentation/atomic_ops.txt b/Documentation/atomic_ops.txt
index 4ef24501045..396bec3b74e 100644
--- a/Documentation/atomic_ops.txt
+++ b/Documentation/atomic_ops.txt
@@ -229,10 +229,10 @@ kernel. It is the use of atomic counters to implement reference
counting, and it works such that once the counter falls to zero it can
be guaranteed that no other entity can be accessing the object:
-static void obj_list_add(struct obj *obj)
+static void obj_list_add(struct obj *obj, struct list_head *head)
{
obj->active = 1;
- list_add(&obj->list);
+ list_add(&obj->list, head);
}
static void obj_list_del(struct obj *obj)
diff --git a/Documentation/block/biodoc.txt b/Documentation/block/biodoc.txt
index 6fab97ea7e6..8d2158a1c6a 100644
--- a/Documentation/block/biodoc.txt
+++ b/Documentation/block/biodoc.txt
@@ -186,7 +186,7 @@ a virtual address mapping (unlike the earlier scheme of virtual address
do not have a corresponding kernel virtual address space mapping) and
low-memory pages.
-Note: Please refer to Documentation/PCI/PCI-DMA-mapping.txt for a discussion
+Note: Please refer to Documentation/DMA-mapping.txt for a discussion
on PCI high mem DMA aspects and mapping of scatter gather lists, and support
for 64 bit PCI.
diff --git a/Documentation/block/data-integrity.txt b/Documentation/block/data-integrity.txt
index e8ca040ba2c..2d735b0ae38 100644
--- a/Documentation/block/data-integrity.txt
+++ b/Documentation/block/data-integrity.txt
@@ -50,7 +50,7 @@ encouraged them to allow separation of the data and integrity metadata
scatter-gather lists.
The controller will interleave the buffers on write and split them on
-read. This means that the Linux can DMA the data buffers to and from
+read. This means that Linux can DMA the data buffers to and from
host memory without changes to the page cache.
Also, the 16-bit CRC checksum mandated by both the SCSI and SATA specs
@@ -66,7 +66,7 @@ software RAID5).
The IP checksum is weaker than the CRC in terms of detecting bit
errors. However, the strength is really in the separation of the data
-buffers and the integrity metadata. These two distinct buffers much
+buffers and the integrity metadata. These two distinct buffers must
match up for an I/O to complete.
The separation of the data and integrity metadata buffers as well as
diff --git a/Documentation/block/deadline-iosched.txt b/Documentation/block/deadline-iosched.txt
index 72576769e0f..2d82c80322c 100644
--- a/Documentation/block/deadline-iosched.txt
+++ b/Documentation/block/deadline-iosched.txt
@@ -58,7 +58,7 @@ same criteria as reads.
front_merges (bool)
------------
-Sometimes it happens that a request enters the io scheduler that is contigious
+Sometimes it happens that a request enters the io scheduler that is contiguous
with a request that is already on the queue. Either it fits in the back of that
request, or it fits at the front. That is called either a back merge candidate
or a front merge candidate. Due to the way files are typically laid out,
diff --git a/Documentation/braille-console.txt b/Documentation/braille-console.txt
index 000b0fbdc10..d0d042c2fd5 100644
--- a/Documentation/braille-console.txt
+++ b/Documentation/braille-console.txt
@@ -27,7 +27,7 @@ parameter.
For simplicity, only one braille console can be enabled, other uses of
console=brl,... will be discarded. Also note that it does not interfere with
-the console selection mecanism described in serial-console.txt
+the console selection mechanism described in serial-console.txt
For now, only the VisioBraille device is supported.
diff --git a/Documentation/cdrom/packet-writing.txt b/Documentation/cdrom/packet-writing.txt
index cf1f8126991..1c407778c8b 100644
--- a/Documentation/cdrom/packet-writing.txt
+++ b/Documentation/cdrom/packet-writing.txt
@@ -117,7 +117,7 @@ Using the pktcdvd debugfs interface
To read pktcdvd device infos in human readable form, do:
- # cat /debug/pktcdvd/pktcdvd[0-7]/info
+ # cat /sys/kernel/debug/pktcdvd/pktcdvd[0-7]/info
For a description of the debugfs interface look into the file:
diff --git a/Documentation/cgroups/cpusets.txt b/Documentation/cgroups/cpusets.txt
index f9ca389dddf..1d7e9784439 100644
--- a/Documentation/cgroups/cpusets.txt
+++ b/Documentation/cgroups/cpusets.txt
@@ -777,6 +777,18 @@ in cpuset directories:
# /bin/echo 1-4 > cpus -> set cpus list to cpus 1,2,3,4
# /bin/echo 1,2,3,4 > cpus -> set cpus list to cpus 1,2,3,4
+To add a CPU to a cpuset, write the new list of CPUs including the
+CPU to be added. To add 6 to the above cpuset:
+
+# /bin/echo 1-4,6 > cpus -> set cpus list to cpus 1,2,3,4,6
+
+Similarly to remove a CPU from a cpuset, write the new list of CPUs
+without the CPU to be removed.
+
+To remove all the CPUs:
+
+# /bin/echo "" > cpus -> clear cpus list
+
2.3 Setting flags
-----------------
diff --git a/Documentation/cgroups/memory.txt b/Documentation/cgroups/memory.txt
index 1a608877b14..23d1262c077 100644
--- a/Documentation/cgroups/memory.txt
+++ b/Documentation/cgroups/memory.txt
@@ -152,14 +152,19 @@ When swap is accounted, following files are added.
usage of mem+swap is limited by memsw.limit_in_bytes.
-Note: why 'mem+swap' rather than swap.
+* why 'mem+swap' rather than swap.
The global LRU(kswapd) can swap out arbitrary pages. Swap-out means
to move account from memory to swap...there is no change in usage of
-mem+swap.
+mem+swap. In other words, when we want to limit the usage of swap without
+affecting global LRU, mem+swap limit is better than just limiting swap from
+OS point of view.
-In other words, when we want to limit the usage of swap without affecting
-global LRU, mem+swap limit is better than just limiting swap from OS point
-of view.
+* What happens when a cgroup hits memory.memsw.limit_in_bytes
+When a cgroup his memory.memsw.limit_in_bytes, it's useless to do swap-out
+in this cgroup. Then, swap-out will not be done by cgroup routine and file
+caches are dropped. But as mentioned above, global LRU can do swapout memory
+from it for sanity of the system's memory management state. You can't forbid
+it by cgroup.
2.5 Reclaim
@@ -204,6 +209,7 @@ We can alter the memory limit:
NOTE: We can use a suffix (k, K, m, M, g or G) to indicate values in kilo,
mega or gigabytes.
+NOTE: We can write "-1" to reset the *.limit_in_bytes(unlimited).
# cat /cgroups/0/memory.limit_in_bytes
4194304
diff --git a/Documentation/connector/cn_test.c b/Documentation/connector/cn_test.c
index 6977c178729..6a5be5d5c8e 100644
--- a/Documentation/connector/cn_test.c
+++ b/Documentation/connector/cn_test.c
@@ -1,7 +1,7 @@
/*
* cn_test.c
*
- * 2004-2005 Copyright (c) Evgeniy Polyakov <johnpol@2ka.mipt.ru>
+ * 2004+ Copyright (c) Evgeniy Polyakov <zbr@ioremap.net>
* All rights reserved.
*
* This program is free software; you can redistribute it and/or modify
@@ -41,6 +41,12 @@ void cn_test_callback(void *data)
msg->seq, msg->ack, msg->len, (char *)msg->data);
}
+/*
+ * Do not remove this function even if no one is using it as
+ * this is an example of how to get notifications about new
+ * connector user registration
+ */
+#if 0
static int cn_test_want_notify(void)
{
struct cn_ctl_msg *ctl;
@@ -117,6 +123,7 @@ nlmsg_failure:
kfree_skb(skb);
return -EINVAL;
}
+#endif
static u32 cn_test_timer_counter;
static void cn_test_timer_func(unsigned long __data)
@@ -187,5 +194,5 @@ module_init(cn_test_init);
module_exit(cn_test_fini);
MODULE_LICENSE("GPL");
-MODULE_AUTHOR("Evgeniy Polyakov <johnpol@2ka.mipt.ru>");
+MODULE_AUTHOR("Evgeniy Polyakov <zbr@ioremap.net>");
MODULE_DESCRIPTION("Connector's test module");
diff --git a/Documentation/connector/ucon.c b/Documentation/connector/ucon.c
index d738cde2a8d..c5092ad0ce4 100644
--- a/Documentation/connector/ucon.c
+++ b/Documentation/connector/ucon.c
@@ -1,7 +1,7 @@
/*
* ucon.c
*
- * Copyright (c) 2004+ Evgeniy Polyakov <johnpol@2ka.mipt.ru>
+ * Copyright (c) 2004+ Evgeniy Polyakov <zbr@ioremap.net>
*
*
* This program is free software; you can redistribute it and/or modify
diff --git a/Documentation/cpu-freq/cpu-drivers.txt b/Documentation/cpu-freq/cpu-drivers.txt
index 43c743903dd..75a58d14d3c 100644
--- a/Documentation/cpu-freq/cpu-drivers.txt
+++ b/Documentation/cpu-freq/cpu-drivers.txt
@@ -155,7 +155,7 @@ actual frequency must be determined using the following rules:
- if relation==CPUFREQ_REL_H, try to select a new_freq lower than or equal
target_freq. ("H for highest, but no higher than")
-Here again the frequency table helper might assist you - see section 3
+Here again the frequency table helper might assist you - see section 2
for details.
diff --git a/Documentation/cpu-freq/governors.txt b/Documentation/cpu-freq/governors.txt
index ce73f3eb5dd..aed082f49d0 100644
--- a/Documentation/cpu-freq/governors.txt
+++ b/Documentation/cpu-freq/governors.txt
@@ -119,10 +119,6 @@ want the kernel to look at the CPU usage and to make decisions on
what to do about the frequency. Typically this is set to values of
around '10000' or more. It's default value is (cmp. with users-guide.txt):
transition_latency * 1000
-The lowest value you can set is:
-transition_latency * 100 or it may get restricted to a value where it
-makes not sense for the kernel anymore to poll that often which depends
-on your HZ config variable (HZ=1000: max=20000us, HZ=250: max=5000).
Be aware that transition latency is in ns and sampling_rate is in us, so you
get the same sysfs value by default.
Sampling rate should always get adjusted considering the transition latency
@@ -131,14 +127,20 @@ in the bash (as said, 1000 is default), do:
echo `$(($(cat cpuinfo_transition_latency) * 750 / 1000)) \
>ondemand/sampling_rate
-show_sampling_rate_(min|max): THIS INTERFACE IS DEPRECATED, DON'T USE IT.
-You can use wider ranges now and the general
-cpuinfo_transition_latency variable (cmp. with user-guide.txt) can be
-used to obtain exactly the same info:
-show_sampling_rate_min = transtition_latency * 500 / 1000
-show_sampling_rate_max = transtition_latency * 500000 / 1000
-(divided by 1000 is to illustrate that sampling rate is in us and
-transition latency is exported ns).
+show_sampling_rate_min:
+The sampling rate is limited by the HW transition latency:
+transition_latency * 100
+Or by kernel restrictions:
+If CONFIG_NO_HZ is set, the limit is 10ms fixed.
+If CONFIG_NO_HZ is not set or no_hz=off boot parameter is used, the
+limits depend on the CONFIG_HZ option:
+HZ=1000: min=20000us (20ms)
+HZ=250: min=80000us (80ms)
+HZ=100: min=200000us (200ms)
+The highest value of kernel and HW latency restrictions is shown and
+used as the minimum sampling rate.
+
+show_sampling_rate_max: THIS INTERFACE IS DEPRECATED, DON'T USE IT.
up_threshold: defines what the average CPU usage between the samplings
of 'sampling_rate' needs to be for the kernel to make a decision on
diff --git a/Documentation/cpu-freq/user-guide.txt b/Documentation/cpu-freq/user-guide.txt
index 75f41193f3e..5d5f5fadd1c 100644
--- a/Documentation/cpu-freq/user-guide.txt
+++ b/Documentation/cpu-freq/user-guide.txt
@@ -31,7 +31,6 @@ Contents:
3. How to change the CPU cpufreq policy and/or speed
3.1 Preferred interface: sysfs
-3.2 Deprecated interfaces
diff --git a/Documentation/dell_rbu.txt b/Documentation/dell_rbu.txt
index c11b931f8f9..15174985ad0 100644
--- a/Documentation/dell_rbu.txt
+++ b/Documentation/dell_rbu.txt
@@ -76,9 +76,9 @@ Do the steps below to download the BIOS image.
The /sys/class/firmware/dell_rbu/ entries will remain till the following is
done.
-echo -1 > /sys/class/firmware/dell_rbu/loading.
+echo -1 > /sys/class/firmware/dell_rbu/loading
Until this step is completed the driver cannot be unloaded.
-Also echoing either mono ,packet or init in to image_type will free up the
+Also echoing either mono, packet or init in to image_type will free up the
memory allocated by the driver.
If a user by accident executes steps 1 and 3 above without executing step 2;
diff --git a/Documentation/development-process/5.Posting b/Documentation/development-process/5.Posting
index dd48132a74d..f622c1e9f0f 100644
--- a/Documentation/development-process/5.Posting
+++ b/Documentation/development-process/5.Posting
@@ -119,7 +119,7 @@ which takes quite a bit of time and thought after the "real work" has been
done. When done properly, though, it is time well spent.
-5.4: PATCH FORMATTING
+5.4: PATCH FORMATTING AND CHANGELOGS
So now you have a perfect series of patches for posting, but the work is
not done quite yet. Each patch needs to be formatted into a message which
@@ -146,8 +146,33 @@ that end, each patch will be composed of the following:
- One or more tag lines, with, at a minimum, one Signed-off-by: line from
the author of the patch. Tags will be described in more detail below.
-The above three items should, normally, be the text used when committing
-the change to a revision control system. They are followed by:
+The items above, together, form the changelog for the patch. Writing good
+changelogs is a crucial but often-neglected art; it's worth spending
+another moment discussing this issue. When writing a changelog, you should
+bear in mind that a number of different people will be reading your words.
+These include subsystem maintainers and reviewers who need to decide
+whether the patch should be included, distributors and other maintainers
+trying to decide whether a patch should be backported to other kernels, bug
+hunters wondering whether the patch is responsible for a problem they are
+chasing, users who want to know how the kernel has changed, and more. A
+good changelog conveys the needed information to all of these people in the
+most direct and concise way possible.
+
+To that end, the summary line should describe the effects of and motivation
+for the change as well as possible given the one-line constraint. The
+detailed description can then amplify on those topics and provide any
+needed additional information. If the patch fixes a bug, cite the commit
+which introduced the bug if possible. If a problem is associated with
+specific log or compiler output, include that output to help others
+searching for a solution to the same problem. If the change is meant to
+support other changes coming in later patch, say so. If internal APIs are
+changed, detail those changes and how other developers should respond. In
+general, the more you can put yourself into the shoes of everybody who will
+be reading your changelog, the better that changelog (and the kernel as a
+whole) will be.
+
+Needless to say, the changelog should be the text used when committing the
+change to a revision control system. It will be followed by:
- The patch itself, in the unified ("-u") patch format. Using the "-p"
option to diff will associate function names with changes, making the
diff --git a/Documentation/device-mapper/dm-log.txt b/Documentation/device-mapper/dm-log.txt
new file mode 100644
index 00000000000..994dd75475a
--- /dev/null
+++ b/Documentation/device-mapper/dm-log.txt
@@ -0,0 +1,54 @@
+Device-Mapper Logging
+=====================
+The device-mapper logging code is used by some of the device-mapper
+RAID targets to track regions of the disk that are not consistent.
+A region (or portion of the address space) of the disk may be
+inconsistent because a RAID stripe is currently being operated on or
+a machine died while the region was being altered. In the case of
+mirrors, a region would be considered dirty/inconsistent while you
+are writing to it because the writes need to be replicated for all
+the legs of the mirror and may not reach the legs at the same time.
+Once all writes are complete, the region is considered clean again.
+
+There is a generic logging interface that the device-mapper RAID
+implementations use to perform logging operations (see
+dm_dirty_log_type in include/linux/dm-dirty-log.h). Various different
+logging implementations are available and provide different
+capabilities. The list includes:
+
+Type Files
+==== =====
+disk drivers/md/dm-log.c
+core drivers/md/dm-log.c
+userspace drivers/md/dm-log-userspace* include/linux/dm-log-userspace.h
+
+The "disk" log type
+-------------------
+This log implementation commits the log state to disk. This way, the
+logging state survives reboots/crashes.
+
+The "core" log type
+-------------------
+This log implementation keeps the log state in memory. The log state
+will not survive a reboot or crash, but there may be a small boost in
+performance. This method can also be used if no storage device is
+available for storing log state.
+
+The "userspace" log type
+------------------------
+This log type simply provides a way to export the log API to userspace,
+so log implementations can be done there. This is done by forwarding most
+logging requests to userspace, where a daemon receives and processes the
+request.
+
+The structure used for communication between kernel and userspace are
+located in include/linux/dm-log-userspace.h. Due to the frequency,
+diversity, and 2-way communication nature of the exchanges between
+kernel and userspace, 'connector' is used as the interface for
+communication.
+
+There are currently two userspace log implementations that leverage this
+framework - "clustered_disk" and "clustered_core". These implementations
+provide a cluster-coherent log for shared-storage. Device-mapper mirroring
+can be used in a shared-storage environment when the cluster log implementations
+are employed.
diff --git a/Documentation/device-mapper/dm-queue-length.txt b/Documentation/device-mapper/dm-queue-length.txt
new file mode 100644
index 00000000000..f4db2562175
--- /dev/null
+++ b/Documentation/device-mapper/dm-queue-length.txt
@@ -0,0 +1,39 @@
+dm-queue-length
+===============
+
+dm-queue-length is a path selector module for device-mapper targets,
+which selects a path with the least number of in-flight I/Os.
+The path selector name is 'queue-length'.
+
+Table parameters for each path: [<repeat_count>]
+ <repeat_count>: The number of I/Os to dispatch using the selected
+ path before switching to the next path.
+ If not given, internal default is used. To check
+ the default value, see the activated table.
+
+Status for each path: <status> <fail-count> <in-flight>
+ <status>: 'A' if the path is active, 'F' if the path is failed.
+ <fail-count>: The number of path failures.
+ <in-flight>: The number of in-flight I/Os on the path.
+
+
+Algorithm
+=========
+
+dm-queue-length increments/decrements 'in-flight' when an I/O is
+dispatched/completed respectively.
+dm-queue-length selects a path with the minimum 'in-flight'.
+
+
+Examples
+========
+In case that 2 paths (sda and sdb) are used with repeat_count == 128.
+
+# echo "0 10 multipath 0 0 1 1 queue-length 0 2 1 8:0 128 8:16 128" \
+ dmsetup create test
+#
+# dmsetup table
+test: 0 10 multipath 0 0 1 1 queue-length 0 2 1 8:0 128 8:16 128
+#
+# dmsetup status
+test: 0 10 multipath 2 0 0 0 1 1 E 0 2 1 8:0 A 0 0 8:16 A 0 0
diff --git a/Documentation/device-mapper/dm-service-time.txt b/Documentation/device-mapper/dm-service-time.txt
new file mode 100644
index 00000000000..7d00668e97b
--- /dev/null
+++ b/Documentation/device-mapper/dm-service-time.txt
@@ -0,0 +1,91 @@
+dm-service-time
+===============
+
+dm-service-time is a path selector module for device-mapper targets,
+which selects a path with the shortest estimated service time for
+the incoming I/O.
+
+The service time for each path is estimated by dividing the total size
+of in-flight I/Os on a path with the performance value of the path.
+The performance value is a relative throughput value among all paths
+in a path-group, and it can be specified as a table argument.
+
+The path selector name is 'service-time'.
+
+Table parameters for each path: [<repeat_count> [<relative_throughput>]]
+ <repeat_count>: The number of I/Os to dispatch using the selected
+ path before switching to the next path.
+ If not given, internal default is used. To check
+ the default value, see the activated table.
+ <relative_throughput>: The relative throughput value of the path
+ among all paths in the path-group.
+ The valid range is 0-100.
+ If not given, minimum value '1' is used.
+ If '0' is given, the path isn't selected while
+ other paths having a positive value are available.
+
+Status for each path: <status> <fail-count> <in-flight-size> \
+ <relative_throughput>
+ <status>: 'A' if the path is active, 'F' if the path is failed.
+ <fail-count>: The number of path failures.
+ <in-flight-size>: The size of in-flight I/Os on the path.
+ <relative_throughput>: The relative throughput value of the path
+ among all paths in the path-group.
+
+
+Algorithm
+=========
+
+dm-service-time adds the I/O size to 'in-flight-size' when the I/O is
+dispatched and substracts when completed.
+Basically, dm-service-time selects a path having minimum service time
+which is calculated by:
+
+ ('in-flight-size' + 'size-of-incoming-io') / 'relative_throughput'
+
+However, some optimizations below are used to reduce the calculation
+as much as possible.
+
+ 1. If the paths have the same 'relative_throughput', skip
+ the division and just compare the 'in-flight-size'.
+
+ 2. If the paths have the same 'in-flight-size', skip the division
+ and just compare the 'relative_throughput'.
+
+ 3. If some paths have non-zero 'relative_throughput' and others
+ have zero 'relative_throughput', ignore those paths with zero
+ 'relative_throughput'.
+
+If such optimizations can't be applied, calculate service time, and
+compare service time.
+If calculated service time is equal, the path having maximum
+'relative_throughput' may be better. So compare 'relative_throughput'
+then.
+
+
+Examples
+========
+In case that 2 paths (sda and sdb) are used with repeat_count == 128
+and sda has an average throughput 1GB/s and sdb has 4GB/s,
+'relative_throughput' value may be '1' for sda and '4' for sdb.
+
+# echo "0 10 multipath 0 0 1 1 service-time 0 2 2 8:0 128 1 8:16 128 4" \
+ dmsetup create test
+#
+# dmsetup table
+test: 0 10 multipath 0 0 1 1 service-time 0 2 2 8:0 128 1 8:16 128 4
+#
+# dmsetup status
+test: 0 10 multipath 2 0 0 0 1 1 E 0 2 2 8:0 A 0 0 1 8:16 A 0 0 4
+
+
+Or '2' for sda and '8' for sdb would be also true.
+
+# echo "0 10 multipath 0 0 1 1 service-time 0 2 2 8:0 128 2 8:16 128 8" \
+ dmsetup create test
+#
+# dmsetup table
+test: 0 10 multipath 0 0 1 1 service-time 0 2 2 8:0 128 2 8:16 128 8
+#
+# dmsetup status
+test: 0 10 multipath 2 0 0 0 1 1 E 0 2 2 8:0 A 0 0 2 8:16 A 0 0 8
diff --git a/Documentation/driver-model/device.txt b/Documentation/driver-model/device.txt
index a7cbfff40d0..a124f3126b0 100644
--- a/Documentation/driver-model/device.txt
+++ b/Documentation/driver-model/device.txt
@@ -162,3 +162,35 @@ device_remove_file(dev,&dev_attr_power);
The file name will be 'power' with a mode of 0644 (-rw-r--r--).
+Word of warning: While the kernel allows device_create_file() and
+device_remove_file() to be called on a device at any time, userspace has
+strict expectations on when attributes get created. When a new device is
+registered in the kernel, a uevent is generated to notify userspace (like
+udev) that a new device is available. If attributes are added after the
+device is registered, then userspace won't get notified and userspace will
+not know about the new attributes.
+
+This is important for device driver that need to publish additional
+attributes for a device at driver probe time. If the device driver simply
+calls device_create_file() on the device structure passed to it, then
+userspace will never be notified of the new attributes. Instead, it should
+probably use class_create() and class->dev_attrs to set up a list of
+desired attributes in the modules_init function, and then in the .probe()
+hook, and then use device_create() to create a new device as a child
+of the probed device. The new device will generate a new uevent and
+properly advertise the new attributes to userspace.
+
+For example, if a driver wanted to add the following attributes:
+struct device_attribute mydriver_attribs[] = {
+ __ATTR(port_count, 0444, port_count_show),
+ __ATTR(serial_number, 0444, serial_number_show),
+ NULL
+};
+
+Then in the module init function is would do:
+ mydriver_class = class_create(THIS_MODULE, "my_attrs");
+ mydriver_class.dev_attr = mydriver_attribs;
+
+And assuming 'dev' is the struct device passed into the probe hook, the driver
+probe function would do something like:
+ create_device(&mydriver_class, dev, chrdev, &private_data, "my_name");
diff --git a/Documentation/driver-model/devres.txt b/Documentation/driver-model/devres.txt
index 387b8a720f4..d79aead9418 100644
--- a/Documentation/driver-model/devres.txt
+++ b/Documentation/driver-model/devres.txt
@@ -188,7 +188,7 @@ For example, you can do something like the following.
void my_midlayer_destroy_something()
{
- devres_release_group(dev, my_midlayer_create_soemthing);
+ devres_release_group(dev, my_midlayer_create_something);
}
diff --git a/Documentation/driver-model/driver.txt b/Documentation/driver-model/driver.txt
index 82132169d47..60120fb3b96 100644
--- a/Documentation/driver-model/driver.txt
+++ b/Documentation/driver-model/driver.txt
@@ -207,8 +207,8 @@ Attributes
~~~~~~~~~~
struct driver_attribute {
struct attribute attr;
- ssize_t (*show)(struct device_driver *, char * buf, size_t count, loff_t off);
- ssize_t (*store)(struct device_driver *, const char * buf, size_t count, loff_t off);
+ ssize_t (*show)(struct device_driver *driver, char *buf);
+ ssize_t (*store)(struct device_driver *, const char * buf, size_t count);
};
Device drivers can export attributes via their sysfs directories.
diff --git a/Documentation/dvb/get_dvb_firmware b/Documentation/dvb/get_dvb_firmware
index 2f21ecd4c20..3d1b0ab70c8 100644
--- a/Documentation/dvb/get_dvb_firmware
+++ b/Documentation/dvb/get_dvb_firmware
@@ -25,7 +25,7 @@ use IO::Handle;
"tda10046lifeview", "av7110", "dec2000t", "dec2540t",
"dec3000s", "vp7041", "dibusb", "nxt2002", "nxt2004",
"or51211", "or51132_qam", "or51132_vsb", "bluebird",
- "opera1", "cx231xx", "cx18", "cx23885", "pvrusb2" );
+ "opera1", "cx231xx", "cx18", "cx23885", "pvrusb2", "mpc718" );
# Check args
syntax() if (scalar(@ARGV) != 1);
@@ -112,7 +112,7 @@ sub tda10045 {
sub tda10046 {
my $sourcefile = "TT_PCI_2.19h_28_11_2006.zip";
- my $url = "http://technotrend-online.com/download/software/219/$sourcefile";
+ my $url = "http://www.tt-download.com/download/updates/219/$sourcefile";
my $hash = "6a7e1e2f2644b162ff0502367553c72d";
my $outfile = "dvb-fe-tda10046.fw";
my $tmpdir = tempdir(DIR => "/tmp", CLEANUP => 1);
@@ -129,8 +129,8 @@ sub tda10046 {
}
sub tda10046lifeview {
- my $sourcefile = "Drv_2.11.02.zip";
- my $url = "http://www.lifeview.com.tw/drivers/pci_card/FlyDVB-T/$sourcefile";
+ my $sourcefile = "7%5Cdrv_2.11.02.zip";
+ my $url = "http://www.lifeview.hk/dbimages/document/$sourcefile";
my $hash = "1ea24dee4eea8fe971686981f34fd2e0";
my $outfile = "dvb-fe-tda10046.fw";
my $tmpdir = tempdir(DIR => "/tmp", CLEANUP => 1);
@@ -317,7 +317,7 @@ sub nxt2002 {
sub nxt2004 {
my $sourcefile = "AVerTVHD_MCE_A180_Drv_v1.2.2.16.zip";
- my $url = "http://www.aver.com/support/Drivers/$sourcefile";
+ my $url = "http://www.avermedia-usa.com/support/Drivers/$sourcefile";
my $hash = "111cb885b1e009188346d72acfed024c";
my $outfile = "dvb-fe-nxt2004.fw";
my $tmpdir = tempdir(DIR => "/tmp", CLEANUP => 1);
@@ -381,6 +381,57 @@ sub cx18 {
$allfiles;
}
+sub mpc718 {
+ my $archive = 'Yuan MPC718 TV Tuner Card 2.13.10.1016.zip';
+ my $url = "ftp://ftp.work.acer-euro.com/desktop/aspire_idea510/vista/Drivers/$archive";
+ my $fwfile = "dvb-cx18-mpc718-mt352.fw";
+ my $tmpdir = tempdir(DIR => "/tmp", CLEANUP => 1);
+
+ checkstandard();
+ wgetfile($archive, $url);
+ unzip($archive, $tmpdir);
+
+ my $sourcefile = "$tmpdir/Yuan MPC718 TV Tuner Card 2.13.10.1016/mpc718_32bit/yuanrap.sys";
+ my $found = 0;
+
+ open IN, '<', $sourcefile or die "Couldn't open $sourcefile to extract $fwfile data\n";
+ binmode IN;
+ open OUT, '>', $fwfile;
+ binmode OUT;
+ {
+ # Block scope because we change the line terminator variable $/
+ my $prevlen = 0;
+ my $currlen;
+
+ # Buried in the data segment are 3 runs of almost identical
+ # register-value pairs that end in 0x5d 0x01 which is a "TUNER GO"
+ # command for the MT352.
+ # Pull out the middle run (because it's easy) of register-value
+ # pairs to make the "firmware" file.
+
+ local $/ = "\x5d\x01"; # MT352 "TUNER GO"
+
+ while (<IN>) {
+ $currlen = length($_);
+ if ($prevlen == $currlen && $currlen <= 64) {
+ chop; chop; # Get rid of "TUNER GO"
+ s/^\0\0//; # get rid of leading 00 00 if it's there
+ printf OUT "$_";
+ $found = 1;
+ last;
+ }
+ $prevlen = $currlen;
+ }
+ }
+ close OUT;
+ close IN;
+ if (!$found) {
+ unlink $fwfile;
+ die "Couldn't find valid register-value sequence in $sourcefile for $fwfile\n";
+ }
+ $fwfile;
+}
+
sub cx23885 {
my $url = "http://linuxtv.org/downloads/firmware/";
diff --git a/Documentation/edac.txt b/Documentation/edac.txt
index 8eda3fb6641..06f8f46692d 100644
--- a/Documentation/edac.txt
+++ b/Documentation/edac.txt
@@ -23,8 +23,8 @@ first time, it was renamed to 'EDAC'.
The bluesmoke project at sourceforge.net is now utilized as a 'staging area'
for EDAC development, before it is sent upstream to kernel.org
-At the bluesmoke/EDAC project site, is a series of quilt patches against
-recent kernels, stored in a SVN respository. For easier downloading, there
+At the bluesmoke/EDAC project site is a series of quilt patches against
+recent kernels, stored in a SVN repository. For easier downloading, there
is also a tarball snapshot available.
============================================================================
@@ -73,9 +73,9 @@ the vendor should tie the parity status bits to 0 if they do not intend
to generate parity. Some vendors do not do this, and thus the parity bit
can "float" giving false positives.
-In the kernel there is a pci device attribute located in sysfs that is
+In the kernel there is a PCI device attribute located in sysfs that is
checked by the EDAC PCI scanning code. If that attribute is set,
-PCI parity/error scannining is skipped for that device. The attribute
+PCI parity/error scanning is skipped for that device. The attribute
is:
broken_parity_status
diff --git a/Documentation/fault-injection/fault-injection.txt b/Documentation/fault-injection/fault-injection.txt
index 4bc374a1434..07930564079 100644
--- a/Documentation/fault-injection/fault-injection.txt
+++ b/Documentation/fault-injection/fault-injection.txt
@@ -29,16 +29,16 @@ o debugfs entries
fault-inject-debugfs kernel module provides some debugfs entries for runtime
configuration of fault-injection capabilities.
-- /debug/fail*/probability:
+- /sys/kernel/debug/fail*/probability:
likelihood of failure injection, in percent.
Format: <percent>
Note that one-failure-per-hundred is a very high error rate
for some testcases. Consider setting probability=100 and configure
- /debug/fail*/interval for such testcases.
+ /sys/kernel/debug/fail*/interval for such testcases.
-- /debug/fail*/interval:
+- /sys/kernel/debug/fail*/interval:
specifies the interval between failures, for calls to
should_fail() that pass all the other tests.
@@ -46,18 +46,18 @@ configuration of fault-injection capabilities.
Note that if you enable this, by setting interval>1, you will
probably want to set probability=100.
-- /debug/fail*/times:
+- /sys/kernel/debug/fail*/times:
specifies how many times failures may happen at most.
A value of -1 means "no limit".
-- /debug/fail*/space:
+- /sys/kernel/debug/fail*/space:
specifies an initial resource "budget", decremented by "size"
on each call to should_fail(,size). Failure injection is
suppressed until "space" reaches zero.
-- /debug/fail*/verbose
+- /sys/kernel/debug/fail*/verbose
Format: { 0 | 1 | 2 }
specifies the verbosity of the messages when failure is
@@ -65,17 +65,17 @@ configuration of fault-injection capabilities.
log line per failure; '2' will print a call trace too -- useful
to debug the problems revealed by fault injection.
-- /debug/fail*/task-filter:
+- /sys/kernel/debug/fail*/task-filter:
Format: { 'Y' | 'N' }
A value of 'N' disables filtering by process (default).
Any positive value limits failures to only processes indicated by
/proc/<pid>/make-it-fail==1.
-- /debug/fail*/require-start:
-- /debug/fail*/require-end:
-- /debug/fail*/reject-start:
-- /debug/fail*/reject-end:
+- /sys/kernel/debug/fail*/require-start:
+- /sys/kernel/debug/fail*/require-end:
+- /sys/kernel/debug/fail*/reject-start:
+- /sys/kernel/debug/fail*/reject-end:
specifies the range of virtual addresses tested during
stacktrace walking. Failure is injected only if some caller
@@ -84,26 +84,26 @@ configuration of fault-injection capabilities.
Default required range is [0,ULONG_MAX) (whole of virtual address space).
Default rejected range is [0,0).
-- /debug/fail*/stacktrace-depth:
+- /sys/kernel/debug/fail*/stacktrace-depth:
specifies the maximum stacktrace depth walked during search
for a caller within [require-start,require-end) OR
[reject-start,reject-end).
-- /debug/fail_page_alloc/ignore-gfp-highmem:
+- /sys/kernel/debug/fail_page_alloc/ignore-gfp-highmem:
Format: { 'Y' | 'N' }
default is 'N', setting it to 'Y' won't inject failures into
highmem/user allocations.
-- /debug/failslab/ignore-gfp-wait:
-- /debug/fail_page_alloc/ignore-gfp-wait:
+- /sys/kernel/debug/failslab/ignore-gfp-wait:
+- /sys/kernel/debug/fail_page_alloc/ignore-gfp-wait:
Format: { 'Y' | 'N' }
default is 'N', setting it to 'Y' will inject failures
only into non-sleep allocations (GFP_ATOMIC allocations).
-- /debug/fail_page_alloc/min-order:
+- /sys/kernel/debug/fail_page_alloc/min-order:
specifies the minimum page allocation order to be injected
failures.
@@ -166,13 +166,13 @@ o Inject slab allocation failures into module init/exit code
#!/bin/bash
FAILTYPE=failslab
-echo Y > /debug/$FAILTYPE/task-filter
-echo 10 > /debug/$FAILTYPE/probability
-echo 100 > /debug/$FAILTYPE/interval
-echo -1 > /debug/$FAILTYPE/times
-echo 0 > /debug/$FAILTYPE/space
-echo 2 > /debug/$FAILTYPE/verbose
-echo 1 > /debug/$FAILTYPE/ignore-gfp-wait
+echo Y > /sys/kernel/debug/$FAILTYPE/task-filter
+echo 10 > /sys/kernel/debug/$FAILTYPE/probability
+echo 100 > /sys/kernel/debug/$FAILTYPE/interval
+echo -1 > /sys/kernel/debug/$FAILTYPE/times
+echo 0 > /sys/kernel/debug/$FAILTYPE/space
+echo 2 > /sys/kernel/debug/$FAILTYPE/verbose
+echo 1 > /sys/kernel/debug/$FAILTYPE/ignore-gfp-wait
faulty_system()
{
@@ -217,20 +217,20 @@ then
exit 1
fi
-cat /sys/module/$module/sections/.text > /debug/$FAILTYPE/require-start
-cat /sys/module/$module/sections/.data > /debug/$FAILTYPE/require-end
+cat /sys/module/$module/sections/.text > /sys/kernel/debug/$FAILTYPE/require-start
+cat /sys/module/$module/sections/.data > /sys/kernel/debug/$FAILTYPE/require-end
-echo N > /debug/$FAILTYPE/task-filter
-echo 10 > /debug/$FAILTYPE/probability
-echo 100 > /debug/$FAILTYPE/interval
-echo -1 > /debug/$FAILTYPE/times
-echo 0 > /debug/$FAILTYPE/space
-echo 2 > /debug/$FAILTYPE/verbose
-echo 1 > /debug/$FAILTYPE/ignore-gfp-wait
-echo 1 > /debug/$FAILTYPE/ignore-gfp-highmem
-echo 10 > /debug/$FAILTYPE/stacktrace-depth
+echo N > /sys/kernel/debug/$FAILTYPE/task-filter
+echo 10 > /sys/kernel/debug/$FAILTYPE/probability
+echo 100 > /sys/kernel/debug/$FAILTYPE/interval
+echo -1 > /sys/kernel/debug/$FAILTYPE/times
+echo 0 > /sys/kernel/debug/$FAILTYPE/space
+echo 2 > /sys/kernel/debug/$FAILTYPE/verbose
+echo 1 > /sys/kernel/debug/$FAILTYPE/ignore-gfp-wait
+echo 1 > /sys/kernel/debug/$FAILTYPE/ignore-gfp-highmem
+echo 10 > /sys/kernel/debug/$FAILTYPE/stacktrace-depth
-trap "echo 0 > /debug/$FAILTYPE/probability" SIGINT SIGTERM EXIT
+trap "echo 0 > /sys/kernel/debug/$FAILTYPE/probability" SIGINT SIGTERM EXIT
echo "Injecting errors into the module $module... (interrupt to stop)"
sleep 1000000
diff --git a/Documentation/fb/sh7760fb.txt b/Documentation/fb/sh7760fb.txt
index c87bfe5c630..b994c3b1054 100644
--- a/Documentation/fb/sh7760fb.txt
+++ b/Documentation/fb/sh7760fb.txt
@@ -1,7 +1,7 @@
SH7760/SH7763 integrated LCDC Framebuffer driver
================================================
-0. Overwiew
+0. Overview
-----------
The SH7760/SH7763 have an integrated LCD Display controller (LCDC) which
supports (in theory) resolutions ranging from 1x1 to 1024x1024,
diff --git a/Documentation/fb/vesafb.txt b/Documentation/fb/vesafb.txt
index ee277dd204b..950d5a658cb 100644
--- a/Documentation/fb/vesafb.txt
+++ b/Documentation/fb/vesafb.txt
@@ -95,7 +95,7 @@ There is no way to change the vesafb video mode and/or timings after
booting linux. If you are not happy with the 60 Hz refresh rate, you
have these options:
- * configure and load the DOS-Tools for your the graphics board (if
+ * configure and load the DOS-Tools for the graphics board (if
available) and boot linux with loadlin.
* use a native driver (matroxfb/atyfb) instead if vesafb. If none
is available, write a new one!
diff --git a/Documentation/feature-removal-schedule.txt b/Documentation/feature-removal-schedule.txt
index de491a3e231..09e031c5588 100644
--- a/Documentation/feature-removal-schedule.txt
+++ b/Documentation/feature-removal-schedule.txt
@@ -6,6 +6,20 @@ be removed from this file.
---------------------------
+What: IRQF_SAMPLE_RANDOM
+Check: IRQF_SAMPLE_RANDOM
+When: July 2009
+
+Why: Many of IRQF_SAMPLE_RANDOM users are technically bogus as entropy
+ sources in the kernel's current entropy model. To resolve this, every
+ input point to the kernel's entropy pool needs to better document the
+ type of entropy source it actually is. This will be replaced with
+ additional add_*_randomness functions in drivers/char/random.c
+
+Who: Robin Getz <rgetz@blackfin.uclinux.org> & Matt Mackall <mpm@selenic.com>
+
+---------------------------
+
What: The ieee80211_regdom module parameter
When: March 2010 / desktop catchup
@@ -354,16 +368,6 @@ Who: Krzysztof Piotr Oledzki <ole@ans.pl>
---------------------------
-What: i2c_attach_client(), i2c_detach_client(), i2c_driver->detach_client(),
- i2c_adapter->client_register(), i2c_adapter->client_unregister
-When: 2.6.30
-Check: i2c_attach_client i2c_detach_client
-Why: Deprecated by the new (standard) device driver binding model. Use
- i2c_driver->probe() and ->remove() instead.
-Who: Jean Delvare <khali@linux-fr.org>
-
----------------------------
-
What: fscher and fscpos drivers
When: June 2009
Why: Deprecated by the new fschmd driver.
@@ -437,3 +441,30 @@ Why: Superseded by tdfxfb. I2C/DDC support used to live in a separate
driver but this caused driver conflicts.
Who: Jean Delvare <khali@linux-fr.org>
Krzysztof Helt <krzysztof.h1@wp.pl>
+
+---------------------------
+
+What: CONFIG_RFKILL_INPUT
+When: 2.6.33
+Why: Should be implemented in userspace, policy daemon.
+Who: Johannes Berg <johannes@sipsolutions.net>
+
+----------------------------
+
+What: CONFIG_X86_OLD_MCE
+When: 2.6.32
+Why: Remove the old legacy 32bit machine check code. This has been
+ superseded by the newer machine check code from the 64bit port,
+ but the old version has been kept around for easier testing. Note this
+ doesn't impact the old P5 and WinChip machine check handlers.
+Who: Andi Kleen <andi@firstfloor.org>
+
+----------------------------
+
+What: lock_policy_rwsem_* and unlock_policy_rwsem_* will not be
+ exported interface anymore.
+When: 2.6.33
+Why: cpu_policy_rwsem has a new cleaner definition making it local to
+ cpufreq core and contained inside cpufreq.c. Other dependent
+ drivers should not use it in order to safely avoid lockdep issues.
+Who: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
diff --git a/Documentation/filesystems/00-INDEX b/Documentation/filesystems/00-INDEX
index 8dd6db76171..f15621ee559 100644
--- a/Documentation/filesystems/00-INDEX
+++ b/Documentation/filesystems/00-INDEX
@@ -66,6 +66,10 @@ mandatory-locking.txt
- info on the Linux implementation of Sys V mandatory file locking.
ncpfs.txt
- info on Novell Netware(tm) filesystem using NCP protocol.
+nfs41-server.txt
+ - info on the Linux server implementation of NFSv4 minor version 1.
+nfs-rdma.txt
+ - how to install and setup the Linux NFS/RDMA client and server software.
nfsroot.txt
- short guide on setting up a diskless box with NFS root filesystem.
nilfs2.txt
diff --git a/Documentation/filesystems/9p.txt b/Documentation/filesystems/9p.txt
index bf8080640eb..6208f55c44c 100644
--- a/Documentation/filesystems/9p.txt
+++ b/Documentation/filesystems/9p.txt
@@ -123,6 +123,9 @@ available from the same CVS repository.
There are user and developer mailing lists available through the v9fs project
on sourceforge (http://sourceforge.net/projects/v9fs).
+A stand-alone version of the module (which should build for any 2.6 kernel)
+is available via (http://github.com/ericvh/9p-sac/tree/master)
+
News and other information is maintained on SWiK (http://swik.net/v9fs).
Bug reports may be issued through the kernel.org bugzilla
diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking
index 3120f8dd2c3..18b9d0ca063 100644
--- a/Documentation/filesystems/Locking
+++ b/Documentation/filesystems/Locking
@@ -109,27 +109,28 @@ prototypes:
locking rules:
All may block.
- BKL s_lock s_umount
-alloc_inode: no no no
-destroy_inode: no
-dirty_inode: no (must not sleep)
-write_inode: no
-drop_inode: no !!!inode_lock!!!
-delete_inode: no
-put_super: yes yes no
-write_super: no yes read
-sync_fs: no no read
-freeze_fs: ?
-unfreeze_fs: ?
-statfs: no no no
-remount_fs: yes yes maybe (see below)
-clear_inode: no
-umount_begin: yes no no
-show_options: no (vfsmount->sem)
-quota_read: no no no (see below)
-quota_write: no no no (see below)
-
-->remount_fs() will have the s_umount lock if it's already mounted.
+ None have BKL
+ s_umount
+alloc_inode:
+destroy_inode:
+dirty_inode: (must not sleep)
+write_inode:
+drop_inode: !!!inode_lock!!!
+delete_inode:
+put_super: write
+write_super: read
+sync_fs: read
+freeze_fs: read
+unfreeze_fs: read
+statfs: no
+remount_fs: maybe (see below)
+clear_inode:
+umount_begin: no
+show_options: no (namespace_sem)
+quota_read: no (see below)
+quota_write: no (see below)
+
+->remount_fs() will have the s_umount exclusive lock if it's already mounted.
When called from get_sb_single, it does NOT have the s_umount lock.
->quota_read() and ->quota_write() functions are both guaranteed to
be the only ones operating on the quota file by the quota code (via
@@ -187,7 +188,7 @@ readpages: no
write_begin: no locks the page yes
write_end: no yes, unlocks yes
perform_write: no n/a yes
-bmap: yes
+bmap: no
invalidatepage: no yes
releasepage: no yes
direct_IO: no
diff --git a/Documentation/filesystems/afs.txt b/Documentation/filesystems/afs.txt
index 12ad6c7f4e5..ffef91c4e0d 100644
--- a/Documentation/filesystems/afs.txt
+++ b/Documentation/filesystems/afs.txt
@@ -23,15 +23,13 @@ it does support include:
(*) Security (currently only AFS kaserver and KerberosIV tickets).
- (*) File reading.
+ (*) File reading and writing.
(*) Automounting.
-It does not yet support the following AFS features:
-
- (*) Write support.
+ (*) Local caching (via fscache).
- (*) Local caching.
+It does not yet support the following AFS features:
(*) pioctl() system call.
@@ -56,7 +54,7 @@ They permit the debugging messages to be turned on dynamically by manipulating
the masks in the following files:
/sys/module/af_rxrpc/parameters/debug
- /sys/module/afs/parameters/debug
+ /sys/module/kafs/parameters/debug
=====
@@ -66,9 +64,9 @@ USAGE
When inserting the driver modules the root cell must be specified along with a
list of volume location server IP addresses:
- insmod af_rxrpc.o
- insmod rxkad.o
- insmod kafs.o rootcell=cambridge.redhat.com:172.16.18.73:172.16.18.91
+ modprobe af_rxrpc
+ modprobe rxkad
+ modprobe kafs rootcell=cambridge.redhat.com:172.16.18.73:172.16.18.91
The first module is the AF_RXRPC network protocol driver. This provides the
RxRPC remote operation protocol and may also be accessed from userspace. See:
@@ -81,7 +79,7 @@ is the actual filesystem driver for the AFS filesystem.
Once the module has been loaded, more modules can be added by the following
procedure:
- echo add grand.central.org 18.7.14.88:128.2.191.224 >/proc/fs/afs/cells
+ echo add grand.central.org 18.9.48.14:128.2.203.61:130.237.48.87 >/proc/fs/afs/cells
Where the parameters to the "add" command are the name of a cell and a list of
volume location servers within that cell, with the latter separated by colons.
@@ -101,7 +99,7 @@ The name of the volume can be suffixes with ".backup" or ".readonly" to
specify connection to only volumes of those types.
The name of the cell is optional, and if not given during a mount, then the
-named volume will be looked up in the cell specified during insmod.
+named volume will be looked up in the cell specified during modprobe.
Additional cells can be added through /proc (see later section).
@@ -163,14 +161,14 @@ THE CELL DATABASE
The filesystem maintains an internal database of all the cells it knows and the
IP addresses of the volume location servers for those cells. The cell to which
-the system belongs is added to the database when insmod is performed by the
+the system belongs is added to the database when modprobe is performed by the
"rootcell=" argument or, if compiled in, using a "kafs.rootcell=" argument on
the kernel command line.
Further cells can be added by commands similar to the following:
echo add CELLNAME VLADDR[:VLADDR][:VLADDR]... >/proc/fs/afs/cells
- echo add grand.central.org 18.7.14.88:128.2.191.224 >/proc/fs/afs/cells
+ echo add grand.central.org 18.9.48.14:128.2.203.61:130.237.48.87 >/proc/fs/afs/cells
No other cell database operations are available at this time.
@@ -233,7 +231,7 @@ insmod /tmp/kafs.o rootcell=cambridge.redhat.com:172.16.18.91
mount -t afs \%root.afs. /afs
mount -t afs \%cambridge.redhat.com:root.cell. /afs/cambridge.redhat.com/
-echo add grand.central.org 18.7.14.88:128.2.191.224 > /proc/fs/afs/cells
+echo add grand.central.org 18.9.48.14:128.2.203.61:130.237.48.87 > /proc/fs/afs/cells
mount -t afs "#grand.central.org:root.cell." /afs/grand.central.org/
mount -t afs "#grand.central.org:root.archive." /afs/grand.central.org/archive
mount -t afs "#grand.central.org:root.contrib." /afs/grand.central.org/contrib
diff --git a/Documentation/filesystems/autofs4-mount-control.txt b/Documentation/filesystems/autofs4-mount-control.txt
index c6341745df3..8f78ded4b64 100644
--- a/Documentation/filesystems/autofs4-mount-control.txt
+++ b/Documentation/filesystems/autofs4-mount-control.txt
@@ -369,7 +369,7 @@ The call requires an initialized struct autofs_dev_ioctl. There are two
possible variations. Both use the path field set to the path of the mount
point to check and the size field adjusted appropriately. One uses the
ioctlfd field to identify a specific mount point to check while the other
-variation uses the path and optionaly arg1 set to an autofs mount type.
+variation uses the path and optionally arg1 set to an autofs mount type.
The call returns 1 if this is a mount point and sets arg1 to the device
number of the mount and field arg2 to the relevant super block magic
number (described below) or 0 if it isn't a mountpoint. In both cases
diff --git a/Documentation/filesystems/caching/netfs-api.txt b/Documentation/filesystems/caching/netfs-api.txt
index 4db125b3a5c..2666b1ed5e9 100644
--- a/Documentation/filesystems/caching/netfs-api.txt
+++ b/Documentation/filesystems/caching/netfs-api.txt
@@ -184,7 +184,7 @@ This has the following fields:
have index children.
If this function is not supplied or if it returns NULL then the first
- cache in the parent's list will be chosed, or failing that, the first
+ cache in the parent's list will be chosen, or failing that, the first
cache in the master list.
(4) A function to retrieve an object's key from the netfs [mandatory].
diff --git a/Documentation/filesystems/debugfs.txt b/Documentation/filesystems/debugfs.txt
new file mode 100644
index 00000000000..ed52af60c2d
--- /dev/null
+++ b/Documentation/filesystems/debugfs.txt
@@ -0,0 +1,158 @@
+Copyright 2009 Jonathan Corbet <corbet@lwn.net>
+
+Debugfs exists as a simple way for kernel developers to make information
+available to user space. Unlike /proc, which is only meant for information
+about a process, or sysfs, which has strict one-value-per-file rules,
+debugfs has no rules at all. Developers can put any information they want
+there. The debugfs filesystem is also intended to not serve as a stable
+ABI to user space; in theory, there are no stability constraints placed on
+files exported there. The real world is not always so simple, though [1];
+even debugfs interfaces are best designed with the idea that they will need
+to be maintained forever.
+
+Debugfs is typically mounted with a command like:
+
+ mount -t debugfs none /sys/kernel/debug
+
+(Or an equivalent /etc/fstab line).
+
+Note that the debugfs API is exported GPL-only to modules.
+
+Code using debugfs should include <linux/debugfs.h>. Then, the first order
+of business will be to create at least one directory to hold a set of
+debugfs files:
+
+ struct dentry *debugfs_create_dir(const char *name, struct dentry *parent);
+
+This call, if successful, will make a directory called name underneath the
+indicated parent directory. If parent is NULL, the directory will be
+created in the debugfs root. On success, the return value is a struct
+dentry pointer which can be used to create files in the directory (and to
+clean it up at the end). A NULL return value indicates that something went
+wrong. If ERR_PTR(-ENODEV) is returned, that is an indication that the
+kernel has been built without debugfs support and none of the functions
+described below will work.
+
+The most general way to create a file within a debugfs directory is with:
+
+ struct dentry *debugfs_create_file(const char *name, mode_t mode,
+ struct dentry *parent, void *data,
+ const struct file_operations *fops);
+
+Here, name is the name of the file to create, mode describes the access
+permissions the file should have, parent indicates the directory which
+should hold the file, data will be stored in the i_private field of the
+resulting inode structure, and fops is a set of file operations which
+implement the file's behavior. At a minimum, the read() and/or write()
+operations should be provided; others can be included as needed. Again,
+the return value will be a dentry pointer to the created file, NULL for
+error, or ERR_PTR(-ENODEV) if debugfs support is missing.
+
+In a number of cases, the creation of a set of file operations is not
+actually necessary; the debugfs code provides a number of helper functions
+for simple situations. Files containing a single integer value can be
+created with any of:
+
+ struct dentry *debugfs_create_u8(const char *name, mode_t mode,
+ struct dentry *parent, u8 *value);
+ struct dentry *debugfs_create_u16(const char *name, mode_t mode,
+ struct dentry *parent, u16 *value);
+ struct dentry *debugfs_create_u32(const char *name, mode_t mode,
+ struct dentry *parent, u32 *value);
+ struct dentry *debugfs_create_u64(const char *name, mode_t mode,
+ struct dentry *parent, u64 *value);
+
+These files support both reading and writing the given value; if a specific
+file should not be written to, simply set the mode bits accordingly. The
+values in these files are in decimal; if hexadecimal is more appropriate,
+the following functions can be used instead:
+
+ struct dentry *debugfs_create_x8(const char *name, mode_t mode,
+ struct dentry *parent, u8 *value);
+ struct dentry *debugfs_create_x16(const char *name, mode_t mode,
+ struct dentry *parent, u16 *value);
+ struct dentry *debugfs_create_x32(const char *name, mode_t mode,
+ struct dentry *parent, u32 *value);
+
+Note that there is no debugfs_create_x64().
+
+These functions are useful as long as the developer knows the size of the
+value to be exported. Some types can have different widths on different
+architectures, though, complicating the situation somewhat. There is a
+function meant to help out in one special case:
+
+ struct dentry *debugfs_create_size_t(const char *name, mode_t mode,
+ struct dentry *parent,
+ size_t *value);
+
+As might be expected, this function will create a debugfs file to represent
+a variable of type size_t.
+
+Boolean values can be placed in debugfs with:
+
+ struct dentry *debugfs_create_bool(const char *name, mode_t mode,
+ struct dentry *parent, u32 *value);
+
+A read on the resulting file will yield either Y (for non-zero values) or
+N, followed by a newline. If written to, it will accept either upper- or
+lower-case values, or 1 or 0. Any other input will be silently ignored.
+
+Finally, a block of arbitrary binary data can be exported with:
+
+ struct debugfs_blob_wrapper {
+ void *data;
+ unsigned long size;
+ };
+
+ struct dentry *debugfs_create_blob(const char *name, mode_t mode,
+ struct dentry *parent,
+ struct debugfs_blob_wrapper *blob);
+
+A read of this file will return the data pointed to by the
+debugfs_blob_wrapper structure. Some drivers use "blobs" as a simple way
+to return several lines of (static) formatted text output. This function
+can be used to export binary information, but there does not appear to be
+any code which does so in the mainline. Note that all files created with
+debugfs_create_blob() are read-only.
+
+There are a couple of other directory-oriented helper functions:
+
+ struct dentry *debugfs_rename(struct dentry *old_dir,
+ struct dentry *old_dentry,
+ struct dentry *new_dir,
+ const char *new_name);
+
+ struct dentry *debugfs_create_symlink(const char *name,
+ struct dentry *parent,
+ const char *target);
+
+A call to debugfs_rename() will give a new name to an existing debugfs
+file, possibly in a different directory. The new_name must not exist prior
+to the call; the return value is old_dentry with updated information.
+Symbolic links can be created with debugfs_create_symlink().
+
+There is one important thing that all debugfs users must take into account:
+there is no automatic cleanup of any directories created in debugfs. If a
+module is unloaded without explicitly removing debugfs entries, the result
+will be a lot of stale pointers and no end of highly antisocial behavior.
+So all debugfs users - at least those which can be built as modules - must
+be prepared to remove all files and directories they create there. A file
+can be removed with:
+
+ void debugfs_remove(struct dentry *dentry);
+
+The dentry value can be NULL, in which case nothing will be removed.
+
+Once upon a time, debugfs users were required to remember the dentry
+pointer for every debugfs file they created so that all files could be
+cleaned up. We live in more civilized times now, though, and debugfs users
+can call:
+
+ void debugfs_remove_recursive(struct dentry *dentry);
+
+If this function is passed a pointer for the dentry corresponding to the
+top-level directory, the entire hierarchy below that directory will be
+removed.
+
+Notes:
+ [1] http://lwn.net/Articles/309298/
diff --git a/Documentation/filesystems/ext2.txt b/Documentation/filesystems/ext2.txt
index e055acb6b2d..67639f905f1 100644
--- a/Documentation/filesystems/ext2.txt
+++ b/Documentation/filesystems/ext2.txt
@@ -322,7 +322,7 @@ an upper limit on the block size imposed by the page size of the kernel,
so 8kB blocks are only allowed on Alpha systems (and other architectures
which support larger pages).
-There is an upper limit of 32768 subdirectories in a single directory.
+There is an upper limit of 32000 subdirectories in a single directory.
There is a "soft" upper limit of about 10-15k files in a single directory
with the current linear linked-list directory implementation. This limit
diff --git a/Documentation/filesystems/ext4.txt b/Documentation/filesystems/ext4.txt
index 97882df0486..7be02ac5fa3 100644
--- a/Documentation/filesystems/ext4.txt
+++ b/Documentation/filesystems/ext4.txt
@@ -235,6 +235,10 @@ minixdf Make 'df' act like Minix.
debug Extra debugging information is sent to syslog.
+abort Simulate the effects of calling ext4_abort() for
+ debugging purposes. This is normally used while
+ remounting a filesystem which is already mounted.
+
errors=remount-ro Remount the filesystem read-only on an error.
errors=continue Keep going on a filesystem error.
errors=panic Panic and halt the machine if an error occurs.
@@ -294,7 +298,7 @@ max_batch_time=usec Maximum amount of time ext4 should wait for
amount of time (on average) that it takes to
finish committing a transaction. Call this time
the "commit time". If the time that the
- transactoin has been running is less than the
+ transaction has been running is less than the
commit time, ext4 will try sleeping for the
commit time to see if other operations will join
the transaction. The commit time is capped by
@@ -328,7 +332,7 @@ noauto_da_alloc replacing existing files via patterns such as
journal commit, in the default data=ordered
mode, the data blocks of the new file are forced
to disk before the rename() operation is
- commited. This provides roughly the same level
+ committed. This provides roughly the same level
of guarantees as ext3, and avoids the
"zero-length" problem that can happen when a
system crashes before the delayed allocation
@@ -358,7 +362,7 @@ written to the journal first, and then to its final location.
In the event of a crash, the journal can be replayed, bringing both data and
metadata into a consistent state. This mode is the slowest except when data
needs to be read from and written to disk at the same time where it
-outperforms all others modes. Curently ext4 does not have delayed
+outperforms all others modes. Currently ext4 does not have delayed
allocation support if this data journalling mode is selected.
References
diff --git a/Documentation/filesystems/fiemap.txt b/Documentation/filesystems/fiemap.txt
index 1e3defcfe50..606233cd461 100644
--- a/Documentation/filesystems/fiemap.txt
+++ b/Documentation/filesystems/fiemap.txt
@@ -204,7 +204,7 @@ fiemap_check_flags() helper:
int fiemap_check_flags(struct fiemap_extent_info *fieinfo, u32 fs_flags);
-The struct fieinfo should be passed in as recieved from ioctl_fiemap(). The
+The struct fieinfo should be passed in as received from ioctl_fiemap(). The
set of fiemap flags which the fs understands should be passed via fs_flags. If
fiemap_check_flags finds invalid user flags, it will place the bad values in
fieinfo->fi_flags and return -EBADR. If the file system gets -EBADR, from
diff --git a/Documentation/filesystems/gfs2-glocks.txt b/Documentation/filesystems/gfs2-glocks.txt
index 4dae9a3840b..0494f78d87e 100644
--- a/Documentation/filesystems/gfs2-glocks.txt
+++ b/Documentation/filesystems/gfs2-glocks.txt
@@ -60,7 +60,7 @@ go_lock | Called for the first local holder of a lock
go_unlock | Called on the final local unlock of a lock
go_dump | Called to print content of object for debugfs file, or on
| error to dump glock to the log.
-go_type; | The type of the glock, LM_TYPE_.....
+go_type | The type of the glock, LM_TYPE_.....
go_min_hold_time | The minimum hold time
The minimum hold time for each lock is the time after a remote lock
diff --git a/Documentation/filesystems/gfs2.txt b/Documentation/filesystems/gfs2.txt
index 593004b6bba..5e3ab8f3bef 100644
--- a/Documentation/filesystems/gfs2.txt
+++ b/Documentation/filesystems/gfs2.txt
@@ -11,18 +11,15 @@ their I/O so file system consistency is maintained. One of the nifty
features of GFS is perfect consistency -- changes made to the file system
on one machine show up immediately on all other machines in the cluster.
-GFS uses interchangable inter-node locking mechanisms. Different lock
-modules can plug into GFS and each file system selects the appropriate
-lock module at mount time. Lock modules include:
+GFS uses interchangable inter-node locking mechanisms, the currently
+supported mechanisms are:
lock_nolock -- allows gfs to be used as a local file system
lock_dlm -- uses a distributed lock manager (dlm) for inter-node locking
The dlm is found at linux/fs/dlm/
-In addition to interfacing with an external locking manager, a gfs lock
-module is responsible for interacting with external cluster management
-systems. Lock_dlm depends on user space cluster management systems found
+Lock_dlm depends on user space cluster management systems found
at the URL above.
To use gfs as a local file system, no external clustering systems are
@@ -31,13 +28,19 @@ needed, simply:
$ mkfs -t gfs2 -p lock_nolock -j 1 /dev/block_device
$ mount -t gfs2 /dev/block_device /dir
-GFS2 is not on-disk compatible with previous versions of GFS.
+If you are using Fedora, you need to install the gfs2-utils package
+and, for lock_dlm, you will also need to install the cman package
+and write a cluster.conf as per the documentation.
+
+GFS2 is not on-disk compatible with previous versions of GFS, but it
+is pretty close.
The following man pages can be found at the URL above:
- gfs2_fsck to repair a filesystem
+ fsck.gfs2 to repair a filesystem
gfs2_grow to expand a filesystem online
gfs2_jadd to add journals to a filesystem online
gfs2_tool to manipulate, examine and tune a filesystem
gfs2_quota to examine and change quota values in a filesystem
+ gfs2_convert to convert a gfs filesystem to gfs2 in-place
mount.gfs2 to help mount(8) mount a filesystem
mkfs.gfs2 to make a filesystem
diff --git a/Documentation/filesystems/isofs.txt b/Documentation/filesystems/isofs.txt
index 6973b980ca2..3c367c3b360 100644
--- a/Documentation/filesystems/isofs.txt
+++ b/Documentation/filesystems/isofs.txt
@@ -23,8 +23,13 @@ Mount options unique to the isofs filesystem.
map=off Do not map non-Rock Ridge filenames to lower case
map=normal Map non-Rock Ridge filenames to lower case
map=acorn As map=normal but also apply Acorn extensions if present
- mode=xxx Sets the permissions on files to xxx
- dmode=xxx Sets the permissions on directories to xxx
+ mode=xxx Sets the permissions on files to xxx unless Rock Ridge
+ extensions set the permissions otherwise
+ dmode=xxx Sets the permissions on directories to xxx unless Rock Ridge
+ extensions set the permissions otherwise
+ overriderockperm Set permissions on files and directories according to
+ 'mode' and 'dmode' even though Rock Ridge extensions are
+ present.
nojoliet Ignore Joliet extensions if they are present.
norock Ignore Rock Ridge extensions if they are present.
hide Completely strip hidden files from the file system.
diff --git a/Documentation/filesystems/nfs-rdma.txt b/Documentation/filesystems/nfs-rdma.txt
index 85eaeaddd27..e386f7e4bce 100644
--- a/Documentation/filesystems/nfs-rdma.txt
+++ b/Documentation/filesystems/nfs-rdma.txt
@@ -100,7 +100,7 @@ Installation
$ sudo cp utils/mount/mount.nfs /sbin/mount.nfs
In this location, mount.nfs will be invoked automatically for NFS mounts
- by the system mount commmand.
+ by the system mount command.
NOTE: mount.nfs and therefore nfs-utils-1.1.2 or greater is only needed
on the NFS client machine. You do not need this specific version of
diff --git a/Documentation/filesystems/nilfs2.txt b/Documentation/filesystems/nilfs2.txt
index 55c4300abfc..01539f41067 100644
--- a/Documentation/filesystems/nilfs2.txt
+++ b/Documentation/filesystems/nilfs2.txt
@@ -39,9 +39,8 @@ Features which NILFS2 does not support yet:
- extended attributes
- POSIX ACLs
- quotas
- - writable snapshots
- - remote backup (CDP)
- - data integrity
+ - fsck
+ - resize
- defragmentation
Mount options
diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
index ce84cfc9eae..ffead13f944 100644
--- a/Documentation/filesystems/proc.txt
+++ b/Documentation/filesystems/proc.txt
@@ -5,11 +5,12 @@
Bodo Bauer <bb@ricochet.net>
2.4.x update Jorge Nerin <comandante@zaralinux.com> November 14 2000
-move /proc/sys Shen Feng <shen@cn.fujitsu.com> April 1 2009
+move /proc/sys Shen Feng <shen@cn.fujitsu.com> April 1 2009
------------------------------------------------------------------------------
Version 1.3 Kernel version 2.2.12
Kernel version 2.4.0-test11-pre4
------------------------------------------------------------------------------
+fixes/update part 1.1 Stefani Seibold <stefani@seibold.net> June 9 2009
Table of Contents
-----------------
@@ -116,7 +117,7 @@ The link self points to the process reading the file system. Each process
subdirectory has the entries listed in Table 1-1.
-Table 1-1: Process specific entries in /proc
+Table 1-1: Process specific entries in /proc
..............................................................................
File Content
clear_refs Clears page referenced bits shown in smaps output
@@ -134,46 +135,103 @@ Table 1-1: Process specific entries in /proc
status Process status in human readable form
wchan If CONFIG_KALLSYMS is set, a pre-decoded wchan
stack Report full stack trace, enable via CONFIG_STACKTRACE
- smaps Extension based on maps, the rss size for each mapped file
+ smaps a extension based on maps, showing the memory consumption of
+ each mapping
..............................................................................
For example, to get the status information of a process, all you have to do is
read the file /proc/PID/status:
- >cat /proc/self/status
- Name: cat
- State: R (running)
- Pid: 5452
- PPid: 743
+ >cat /proc/self/status
+ Name: cat
+ State: R (running)
+ Tgid: 5452
+ Pid: 5452
+ PPid: 743
TracerPid: 0 (2.4)
- Uid: 501 501 501 501
- Gid: 100 100 100 100
- Groups: 100 14 16
- VmSize: 1112 kB
- VmLck: 0 kB
- VmRSS: 348 kB
- VmData: 24 kB
- VmStk: 12 kB
- VmExe: 8 kB
- VmLib: 1044 kB
- SigPnd: 0000000000000000
- SigBlk: 0000000000000000
- SigIgn: 0000000000000000
- SigCgt: 0000000000000000
- CapInh: 00000000fffffeff
- CapPrm: 0000000000000000
- CapEff: 0000000000000000
-
+ Uid: 501 501 501 501
+ Gid: 100 100 100 100
+ FDSize: 256
+ Groups: 100 14 16
+ VmPeak: 5004 kB
+ VmSize: 5004 kB
+ VmLck: 0 kB
+ VmHWM: 476 kB
+ VmRSS: 476 kB
+ VmData: 156 kB
+ VmStk: 88 kB
+ VmExe: 68 kB
+ VmLib: 1412 kB
+ VmPTE: 20 kb
+ Threads: 1
+ SigQ: 0/28578
+ SigPnd: 0000000000000000
+ ShdPnd: 0000000000000000
+ SigBlk: 0000000000000000
+ SigIgn: 0000000000000000
+ SigCgt: 0000000000000000
+ CapInh: 00000000fffffeff
+ CapPrm: 0000000000000000
+ CapEff: 0000000000000000
+ CapBnd: ffffffffffffffff
+ voluntary_ctxt_switches: 0
+ nonvoluntary_ctxt_switches: 1
This shows you nearly the same information you would get if you viewed it with
the ps command. In fact, ps uses the proc file system to obtain its
-information. The statm file contains more detailed information about the
-process memory usage. Its seven fields are explained in Table 1-2. The stat
-file contains details information about the process itself. Its fields are
-explained in Table 1-3.
+information. But you get a more detailed view of the process by reading the
+file /proc/PID/status. It fields are described in table 1-2.
+
+The statm file contains more detailed information about the process
+memory usage. Its seven fields are explained in Table 1-3. The stat file
+contains details information about the process itself. Its fields are
+explained in Table 1-4.
+Table 1-2: Contents of the statm files (as of 2.6.30-rc7)
+..............................................................................
+ Field Content
+ Name filename of the executable
+ State state (R is running, S is sleeping, D is sleeping
+ in an uninterruptible wait, Z is zombie,
+ T is traced or stopped)
+ Tgid thread group ID
+ Pid process id
+ PPid process id of the parent process
+ TracerPid PID of process tracing this process (0 if not)
+ Uid Real, effective, saved set, and file system UIDs
+ Gid Real, effective, saved set, and file system GIDs
+ FDSize number of file descriptor slots currently allocated
+ Groups supplementary group list
+ VmPeak peak virtual memory size
+ VmSize total program size
+ VmLck locked memory size
+ VmHWM peak resident set size ("high water mark")
+ VmRSS size of memory portions
+ VmData size of data, stack, and text segments
+ VmStk size of data, stack, and text segments
+ VmExe size of text segment
+ VmLib size of shared library code
+ VmPTE size of page table entries
+ Threads number of threads
+ SigQ number of signals queued/max. number for queue
+ SigPnd bitmap of pending signals for the thread
+ ShdPnd bitmap of shared pending signals for the process
+ SigBlk bitmap of blocked signals
+ SigIgn bitmap of ignored signals
+ SigCgt bitmap of catched signals
+ CapInh bitmap of inheritable capabilities
+ CapPrm bitmap of permitted capabilities
+ CapEff bitmap of effective capabilities
+ CapBnd bitmap of capabilities bounding set
+ Cpus_allowed mask of CPUs on which this process may run
+ Cpus_allowed_list Same as previous, but in "list format"
+ Mems_allowed mask of memory nodes allowed to this process
+ Mems_allowed_list Same as previous, but in "list format"
+ voluntary_ctxt_switches number of voluntary context switches
+ nonvoluntary_ctxt_switches number of non voluntary context switches
+..............................................................................
-Table 1-2: Contents of the statm files (as of 2.6.8-rc3)
+Table 1-3: Contents of the statm files (as of 2.6.8-rc3)
..............................................................................
Field Content
size total program size (pages) (same as VmSize in status)
@@ -188,7 +246,7 @@ Table 1-2: Contents of the statm files (as of 2.6.8-rc3)
..............................................................................
-Table 1-3: Contents of the stat files (as of 2.6.22-rc3)
+Table 1-4: Contents of the stat files (as of 2.6.30-rc7)
..............................................................................
Field Content
pid process id
@@ -222,10 +280,10 @@ Table 1-3: Contents of the stat files (as of 2.6.22-rc3)
start_stack address of the start of the stack
esp current value of ESP
eip current value of EIP
- pending bitmap of pending signals (obsolete)
- blocked bitmap of blocked signals (obsolete)
- sigign bitmap of ignored signals (obsolete)
- sigcatch bitmap of catched signals (obsolete)
+ pending bitmap of pending signals
+ blocked bitmap of blocked signals
+ sigign bitmap of ignored signals
+ sigcatch bitmap of catched signals
wchan address where process went to sleep
0 (place holder)
0 (place holder)
@@ -234,19 +292,99 @@ Table 1-3: Contents of the stat files (as of 2.6.22-rc3)
rt_priority realtime priority
policy scheduling policy (man sched_setscheduler)
blkio_ticks time spent waiting for block IO
+ gtime guest time of the task in jiffies
+ cgtime guest time of the task children in jiffies
..............................................................................
+The /proc/PID/map file containing the currently mapped memory regions and
+their access permissions.
+
+The format is:
+
+address perms offset dev inode pathname
+
+08048000-08049000 r-xp 00000000 03:00 8312 /opt/test
+08049000-0804a000 rw-p 00001000 03:00 8312 /opt/test
+0804a000-0806b000 rw-p 00000000 00:00 0 [heap]
+a7cb1000-a7cb2000 ---p 00000000 00:00 0
+a7cb2000-a7eb2000 rw-p 00000000 00:00 0
+a7eb2000-a7eb3000 ---p 00000000 00:00 0
+a7eb3000-a7ed5000 rw-p 00000000 00:00 0
+a7ed5000-a8008000 r-xp 00000000 03:00 4222 /lib/libc.so.6
+a8008000-a800a000 r--p 00133000 03:00 4222 /lib/libc.so.6
+a800a000-a800b000 rw-p 00135000 03:00 4222 /lib/libc.so.6
+a800b000-a800e000 rw-p 00000000 00:00 0
+a800e000-a8022000 r-xp 00000000 03:00 14462 /lib/libpthread.so.0
+a8022000-a8023000 r--p 00013000 03:00 14462 /lib/libpthread.so.0
+a8023000-a8024000 rw-p 00014000 03:00 14462 /lib/libpthread.so.0
+a8024000-a8027000 rw-p 00000000 00:00 0
+a8027000-a8043000 r-xp 00000000 03:00 8317 /lib/ld-linux.so.2
+a8043000-a8044000 r--p 0001b000 03:00 8317 /lib/ld-linux.so.2
+a8044000-a8045000 rw-p 0001c000 03:00 8317 /lib/ld-linux.so.2
+aff35000-aff4a000 rw-p 00000000 00:00 0 [stack]
+ffffe000-fffff000 r-xp 00000000 00:00 0 [vdso]
+
+where "address" is the address space in the process that it occupies, "perms"
+is a set of permissions:
+
+ r = read
+ w = write
+ x = execute
+ s = shared
+ p = private (copy on write)
+
+"offset" is the offset into the mapping, "dev" is the device (major:minor), and
+"inode" is the inode on that device. 0 indicates that no inode is associated
+with the memory region, as the case would be with BSS (uninitialized data).
+The "pathname" shows the name associated file for this mapping. If the mapping
+is not associated with a file:
+
+ [heap] = the heap of the program
+ [stack] = the stack of the main process
+ [vdso] = the "virtual dynamic shared object",
+ the kernel system call handler
+
+ or if empty, the mapping is anonymous.
+
+
+The /proc/PID/smaps is an extension based on maps, showing the memory
+consumption for each of the process's mappings. For each of mappings there
+is a series of lines such as the following:
+
+08048000-080bc000 r-xp 00000000 03:02 13130 /bin/bash
+Size: 1084 kB
+Rss: 892 kB
+Pss: 374 kB
+Shared_Clean: 892 kB
+Shared_Dirty: 0 kB
+Private_Clean: 0 kB
+Private_Dirty: 0 kB
+Referenced: 892 kB
+Swap: 0 kB
+KernelPageSize: 4 kB
+MMUPageSize: 4 kB
+
+The first of these lines shows the same information as is displayed for the
+mapping in /proc/PID/maps. The remaining lines show the size of the mapping,
+the amount of the mapping that is currently resident in RAM, the "proportional
+set size” (divide each shared page by the number of processes sharing it), the
+number of clean and dirty shared pages in the mapping, and the number of clean
+and dirty private pages in the mapping. The "Referenced" indicates the amount
+of memory currently marked as referenced or accessed.
+
+This file is only present if the CONFIG_MMU kernel configuration option is
+enabled.
1.2 Kernel data
---------------
Similar to the process entries, the kernel data files give information about
the running kernel. The files used to obtain this information are contained in
-/proc and are listed in Table 1-4. Not all of these will be present in your
+/proc and are listed in Table 1-5. Not all of these will be present in your
system. It depends on the kernel configuration and the loaded modules, which
files are there, and which are missing.
-Table 1-4: Kernel info in /proc
+Table 1-5: Kernel info in /proc
..............................................................................
File Content
apm Advanced power management info
@@ -283,6 +421,7 @@ Table 1-4: Kernel info in /proc
rtc Real time clock
scsi SCSI info (see text)
slabinfo Slab pool info
+ softirqs softirq usage
stat Overall statistics
swaps Swap space utilization
sys See chapter 2
@@ -366,7 +505,7 @@ just those considered 'most important'. The new vectors are:
RES, CAL, TLB -- rescheduling, call and TLB flush interrupts are
sent from one CPU to another per the needs of the OS. Typically,
their statistics are used by kernel developers and interested users to
- determine the occurance of interrupt of the given type.
+ determine the occurrence of interrupts of the given type.
The above IRQ vectors are displayed only when relevent. For example,
the threshold vector does not exist on x86_64 platforms. Others are
@@ -551,7 +690,7 @@ Committed_AS: The amount of memory presently allocated on the system.
memory once that memory has been successfully allocated.
VmallocTotal: total size of vmalloc memory area
VmallocUsed: amount of vmalloc area which is used
-VmallocChunk: largest contigious block of vmalloc area which is free
+VmallocChunk: largest contiguous block of vmalloc area which is free
..............................................................................
@@ -597,6 +736,25 @@ on the kind of area :
0xffffffffa0017000-0xffffffffa0022000 45056 sys_init_module+0xc27/0x1d00 ...
pages=10 vmalloc N0=10
+..............................................................................
+
+softirqs:
+
+Provides counts of softirq handlers serviced since boot time, for each cpu.
+
+> cat /proc/softirqs
+ CPU0 CPU1 CPU2 CPU3
+ HI: 0 0 0 0
+ TIMER: 27166 27120 27097 27034
+ NET_TX: 0 0 0 17
+ NET_RX: 42 0 0 39
+ BLOCK: 0 0 107 1121
+ TASKLET: 0 0 0 290
+ SCHED: 27035 26983 26971 26746
+ HRTIMER: 0 0 0 0
+ RCU: 1678 1769 2178 2250
+
+
1.3 IDE devices in /proc/ide
----------------------------
@@ -614,10 +772,10 @@ IDE devices:
More detailed information can be found in the controller specific
subdirectories. These are named ide0, ide1 and so on. Each of these
-directories contains the files shown in table 1-5.
+directories contains the files shown in table 1-6.
-Table 1-5: IDE controller info in /proc/ide/ide?
+Table 1-6: IDE controller info in /proc/ide/ide?
..............................................................................
File Content
channel IDE channel (0 or 1)
@@ -627,11 +785,11 @@ Table 1-5: IDE controller info in /proc/ide/ide?
..............................................................................
Each device connected to a controller has a separate subdirectory in the
-controllers directory. The files listed in table 1-6 are contained in these
+controllers directory. The files listed in table 1-7 are contained in these
directories.
-Table 1-6: IDE device information
+Table 1-7: IDE device information
..............................................................................
File Content
cache The cache
@@ -673,12 +831,12 @@ the drive parameters:
1.4 Networking info in /proc/net
--------------------------------
-The subdirectory /proc/net follows the usual pattern. Table 1-6 shows the
+The subdirectory /proc/net follows the usual pattern. Table 1-8 shows the
additional values you get for IP version 6 if you configure the kernel to
-support this. Table 1-7 lists the files and their meaning.
+support this. Table 1-9 lists the files and their meaning.
-Table 1-6: IPv6 info in /proc/net
+Table 1-8: IPv6 info in /proc/net
..............................................................................
File Content
udp6 UDP sockets (IPv6)
@@ -693,7 +851,7 @@ Table 1-6: IPv6 info in /proc/net
..............................................................................
-Table 1-7: Network info in /proc/net
+Table 1-9: Network info in /proc/net
..............................................................................
File Content
arp Kernel ARP table
@@ -817,10 +975,10 @@ The directory /proc/parport contains information about the parallel ports of
your system. It has one subdirectory for each port, named after the port
number (0,1,2,...).
-These directories contain the four files shown in Table 1-8.
+These directories contain the four files shown in Table 1-10.
-Table 1-8: Files in /proc/parport
+Table 1-10: Files in /proc/parport
..............................................................................
File Content
autoprobe Any IEEE-1284 device ID information that has been acquired.
@@ -838,10 +996,10 @@ Table 1-8: Files in /proc/parport
Information about the available and actually used tty's can be found in the
directory /proc/tty.You'll find entries for drivers and line disciplines in
-this directory, as shown in Table 1-9.
+this directory, as shown in Table 1-11.
-Table 1-9: Files in /proc/tty
+Table 1-11: Files in /proc/tty
..............................................................................
File Content
drivers list of drivers and their usage
@@ -883,6 +1041,7 @@ since the system first booted. For a quick look, simply cat the file:
processes 2915
procs_running 1
procs_blocked 0
+ softirq 183433 0 21755 12 39 1137 231 21459 2263
The very first "cpu" line aggregates the numbers in all of the other "cpuN"
lines. These numbers identify the amount of time the CPU has spent performing
@@ -918,6 +1077,11 @@ CPUs.
The "procs_blocked" line gives the number of processes currently blocked,
waiting for I/O to complete.
+The "softirq" line gives counts of softirqs serviced since boot time, for each
+of the possible system softirqs. The first column is the total of all
+softirqs serviced; each subsequent column is the total for that particular
+softirq.
+
1.9 Ext4 file system parameters
------------------------------
@@ -926,9 +1090,9 @@ Information about mounted ext4 file systems can be found in
/proc/fs/ext4. Each mounted filesystem will have a directory in
/proc/fs/ext4 based on its device name (i.e., /proc/fs/ext4/hdc or
/proc/fs/ext4/dm-0). The files in each per-device directory are shown
-in Table 1-10, below.
+in Table 1-12, below.
-Table 1-10: Files in /proc/fs/ext4/<devname>
+Table 1-12: Files in /proc/fs/ext4/<devname>
..............................................................................
File Content
mb_groups details of multiblock allocator buddy cache of free blocks
diff --git a/Documentation/filesystems/sysfs-pci.txt b/Documentation/filesystems/sysfs-pci.txt
index 26e4b8bc53e..85354b32d73 100644
--- a/Documentation/filesystems/sysfs-pci.txt
+++ b/Documentation/filesystems/sysfs-pci.txt
@@ -72,7 +72,7 @@ The 'rom' file is special in that it provides read-only access to the device's
ROM file, if available. It's disabled by default, however, so applications
should write the string "1" to the file to enable it before attempting a read
call, and disable it following the access by writing "0" to the file. Note
-that the device must be enabled for a rom read to return data succesfully.
+that the device must be enabled for a rom read to return data successfully.
In the event a driver is not bound to the device, it can be enabled using the
'enable' file, documented above.
diff --git a/Documentation/filesystems/sysfs.txt b/Documentation/filesystems/sysfs.txt
index 7e81e37c0b1..b245d524d56 100644
--- a/Documentation/filesystems/sysfs.txt
+++ b/Documentation/filesystems/sysfs.txt
@@ -23,7 +23,8 @@ interface.
Using sysfs
~~~~~~~~~~~
-sysfs is always compiled in. You can access it by doing:
+sysfs is always compiled in if CONFIG_SYSFS is defined. You can access
+it by doing:
mount -t sysfs sysfs /sys
diff --git a/Documentation/filesystems/vfat.txt b/Documentation/filesystems/vfat.txt
index 3a5ddc96901..b58b84b50fa 100644
--- a/Documentation/filesystems/vfat.txt
+++ b/Documentation/filesystems/vfat.txt
@@ -124,14 +124,19 @@ sys_immutable -- If set, ATTR_SYS attribute on FAT is handled as
flush -- If set, the filesystem will try to flush to disk more
early than normal. Not set by default.
-rodir -- FAT has the ATTR_RO (read-only) attribute. But on Windows,
- the ATTR_RO of the directory will be just ignored actually,
- and is used by only applications as flag. E.g. it's setted
- for the customized folder.
+rodir -- FAT has the ATTR_RO (read-only) attribute. On Windows,
+ the ATTR_RO of the directory will just be ignored,
+ and is used only by applications as a flag (e.g. it's set
+ for the customized folder).
If you want to use ATTR_RO as read-only flag even for
the directory, set this option.
+errors=panic|continue|remount-ro
+ -- specify FAT behavior on critical errors: panic, continue
+ without doing anything or remount the partition in
+ read-only mode (default behavior).
+
<bool>: 0,1,yes,no,true,false
TODO
diff --git a/Documentation/firmware_class/README b/Documentation/firmware_class/README
index c3480aa66ba..7eceaff63f5 100644
--- a/Documentation/firmware_class/README
+++ b/Documentation/firmware_class/README
@@ -77,7 +77,8 @@
seconds for the whole load operation.
- request_firmware_nowait() is also provided for convenience in
- non-user contexts.
+ user contexts to request firmware asynchronously, but can't be called
+ in atomic contexts.
about in-kernel persistence:
diff --git a/Documentation/futex-requeue-pi.txt b/Documentation/futex-requeue-pi.txt
new file mode 100644
index 00000000000..9dc1ff4fd53
--- /dev/null
+++ b/Documentation/futex-requeue-pi.txt
@@ -0,0 +1,131 @@
+Futex Requeue PI
+----------------
+
+Requeueing of tasks from a non-PI futex to a PI futex requires
+special handling in order to ensure the underlying rt_mutex is never
+left without an owner if it has waiters; doing so would break the PI
+boosting logic [see rt-mutex-desgin.txt] For the purposes of
+brevity, this action will be referred to as "requeue_pi" throughout
+this document. Priority inheritance is abbreviated throughout as
+"PI".
+
+Motivation
+----------
+
+Without requeue_pi, the glibc implementation of
+pthread_cond_broadcast() must resort to waking all the tasks waiting
+on a pthread_condvar and letting them try to sort out which task
+gets to run first in classic thundering-herd formation. An ideal
+implementation would wake the highest-priority waiter, and leave the
+rest to the natural wakeup inherent in unlocking the mutex
+associated with the condvar.
+
+Consider the simplified glibc calls:
+
+/* caller must lock mutex */
+pthread_cond_wait(cond, mutex)
+{
+ lock(cond->__data.__lock);
+ unlock(mutex);
+ do {
+ unlock(cond->__data.__lock);
+ futex_wait(cond->__data.__futex);
+ lock(cond->__data.__lock);
+ } while(...)
+ unlock(cond->__data.__lock);
+ lock(mutex);
+}
+
+pthread_cond_broadcast(cond)
+{
+ lock(cond->__data.__lock);
+ unlock(cond->__data.__lock);
+ futex_requeue(cond->data.__futex, cond->mutex);
+}
+
+Once pthread_cond_broadcast() requeues the tasks, the cond->mutex
+has waiters. Note that pthread_cond_wait() attempts to lock the
+mutex only after it has returned to user space. This will leave the
+underlying rt_mutex with waiters, and no owner, breaking the
+previously mentioned PI-boosting algorithms.
+
+In order to support PI-aware pthread_condvar's, the kernel needs to
+be able to requeue tasks to PI futexes. This support implies that
+upon a successful futex_wait system call, the caller would return to
+user space already holding the PI futex. The glibc implementation
+would be modified as follows:
+
+
+/* caller must lock mutex */
+pthread_cond_wait_pi(cond, mutex)
+{
+ lock(cond->__data.__lock);
+ unlock(mutex);
+ do {
+ unlock(cond->__data.__lock);
+ futex_wait_requeue_pi(cond->__data.__futex);
+ lock(cond->__data.__lock);
+ } while(...)
+ unlock(cond->__data.__lock);
+ /* the kernel acquired the the mutex for us */
+}
+
+pthread_cond_broadcast_pi(cond)
+{
+ lock(cond->__data.__lock);
+ unlock(cond->__data.__lock);
+ futex_requeue_pi(cond->data.__futex, cond->mutex);
+}
+
+The actual glibc implementation will likely test for PI and make the
+necessary changes inside the existing calls rather than creating new
+calls for the PI cases. Similar changes are needed for
+pthread_cond_timedwait() and pthread_cond_signal().
+
+Implementation
+--------------
+
+In order to ensure the rt_mutex has an owner if it has waiters, it
+is necessary for both the requeue code, as well as the waiting code,
+to be able to acquire the rt_mutex before returning to user space.
+The requeue code cannot simply wake the waiter and leave it to
+acquire the rt_mutex as it would open a race window between the
+requeue call returning to user space and the waiter waking and
+starting to run. This is especially true in the uncontended case.
+
+The solution involves two new rt_mutex helper routines,
+rt_mutex_start_proxy_lock() and rt_mutex_finish_proxy_lock(), which
+allow the requeue code to acquire an uncontended rt_mutex on behalf
+of the waiter and to enqueue the waiter on a contended rt_mutex.
+Two new system calls provide the kernel<->user interface to
+requeue_pi: FUTEX_WAIT_REQUEUE_PI and FUTEX_REQUEUE_CMP_PI.
+
+FUTEX_WAIT_REQUEUE_PI is called by the waiter (pthread_cond_wait()
+and pthread_cond_timedwait()) to block on the initial futex and wait
+to be requeued to a PI-aware futex. The implementation is the
+result of a high-speed collision between futex_wait() and
+futex_lock_pi(), with some extra logic to check for the additional
+wake-up scenarios.
+
+FUTEX_REQUEUE_CMP_PI is called by the waker
+(pthread_cond_broadcast() and pthread_cond_signal()) to requeue and
+possibly wake the waiting tasks. Internally, this system call is
+still handled by futex_requeue (by passing requeue_pi=1). Before
+requeueing, futex_requeue() attempts to acquire the requeue target
+PI futex on behalf of the top waiter. If it can, this waiter is
+woken. futex_requeue() then proceeds to requeue the remaining
+nr_wake+nr_requeue tasks to the PI futex, calling
+rt_mutex_start_proxy_lock() prior to each requeue to prepare the
+task as a waiter on the underlying rt_mutex. It is possible that
+the lock can be acquired at this stage as well, if so, the next
+waiter is woken to finish the acquisition of the lock.
+
+FUTEX_REQUEUE_PI accepts nr_wake and nr_requeue as arguments, but
+their sum is all that really matters. futex_requeue() will wake or
+requeue up to nr_wake + nr_requeue tasks. It will wake only as many
+tasks as it can acquire the lock for, which in the majority of cases
+should be 0 as good programming practice dictates that the caller of
+either pthread_cond_broadcast() or pthread_cond_signal() acquire the
+mutex prior to making the call. FUTEX_REQUEUE_PI requires that
+nr_wake=1. nr_requeue should be INT_MAX for broadcast and 0 for
+signal.
diff --git a/Documentation/gcov.txt b/Documentation/gcov.txt
new file mode 100644
index 00000000000..40ec6335276
--- /dev/null
+++ b/Documentation/gcov.txt
@@ -0,0 +1,253 @@
+Using gcov with the Linux kernel
+================================
+
+1. Introduction
+2. Preparation
+3. Customization
+4. Files
+5. Modules
+6. Separated build and test machines
+7. Troubleshooting
+Appendix A: sample script: gather_on_build.sh
+Appendix B: sample script: gather_on_test.sh
+
+
+1. Introduction
+===============
+
+gcov profiling kernel support enables the use of GCC's coverage testing
+tool gcov [1] with the Linux kernel. Coverage data of a running kernel
+is exported in gcov-compatible format via the "gcov" debugfs directory.
+To get coverage data for a specific file, change to the kernel build
+directory and use gcov with the -o option as follows (requires root):
+
+# cd /tmp/linux-out
+# gcov -o /sys/kernel/debug/gcov/tmp/linux-out/kernel spinlock.c
+
+This will create source code files annotated with execution counts
+in the current directory. In addition, graphical gcov front-ends such
+as lcov [2] can be used to automate the process of collecting data
+for the entire kernel and provide coverage overviews in HTML format.
+
+Possible uses:
+
+* debugging (has this line been reached at all?)
+* test improvement (how do I change my test to cover these lines?)
+* minimizing kernel configurations (do I need this option if the
+ associated code is never run?)
+
+--
+
+[1] http://gcc.gnu.org/onlinedocs/gcc/Gcov.html
+[2] http://ltp.sourceforge.net/coverage/lcov.php
+
+
+2. Preparation
+==============
+
+Configure the kernel with:
+
+ CONFIG_DEBUGFS=y
+ CONFIG_GCOV_KERNEL=y
+
+and to get coverage data for the entire kernel:
+
+ CONFIG_GCOV_PROFILE_ALL=y
+
+Note that kernels compiled with profiling flags will be significantly
+larger and run slower. Also CONFIG_GCOV_PROFILE_ALL may not be supported
+on all architectures.
+
+Profiling data will only become accessible once debugfs has been
+mounted:
+
+ mount -t debugfs none /sys/kernel/debug
+
+
+3. Customization
+================
+
+To enable profiling for specific files or directories, add a line
+similar to the following to the respective kernel Makefile:
+
+ For a single file (e.g. main.o):
+ GCOV_PROFILE_main.o := y
+
+ For all files in one directory:
+ GCOV_PROFILE := y
+
+To exclude files from being profiled even when CONFIG_GCOV_PROFILE_ALL
+is specified, use:
+
+ GCOV_PROFILE_main.o := n
+ and:
+ GCOV_PROFILE := n
+
+Only files which are linked to the main kernel image or are compiled as
+kernel modules are supported by this mechanism.
+
+
+4. Files
+========
+
+The gcov kernel support creates the following files in debugfs:
+
+ /sys/kernel/debug/gcov
+ Parent directory for all gcov-related files.
+
+ /sys/kernel/debug/gcov/reset
+ Global reset file: resets all coverage data to zero when
+ written to.
+
+ /sys/kernel/debug/gcov/path/to/compile/dir/file.gcda
+ The actual gcov data file as understood by the gcov
+ tool. Resets file coverage data to zero when written to.
+
+ /sys/kernel/debug/gcov/path/to/compile/dir/file.gcno
+ Symbolic link to a static data file required by the gcov
+ tool. This file is generated by gcc when compiling with
+ option -ftest-coverage.
+
+
+5. Modules
+==========
+
+Kernel modules may contain cleanup code which is only run during
+module unload time. The gcov mechanism provides a means to collect
+coverage data for such code by keeping a copy of the data associated
+with the unloaded module. This data remains available through debugfs.
+Once the module is loaded again, the associated coverage counters are
+initialized with the data from its previous instantiation.
+
+This behavior can be deactivated by specifying the gcov_persist kernel
+parameter:
+
+ gcov_persist=0
+
+At run-time, a user can also choose to discard data for an unloaded
+module by writing to its data file or the global reset file.
+
+
+6. Separated build and test machines
+====================================
+
+The gcov kernel profiling infrastructure is designed to work out-of-the
+box for setups where kernels are built and run on the same machine. In
+cases where the kernel runs on a separate machine, special preparations
+must be made, depending on where the gcov tool is used:
+
+a) gcov is run on the TEST machine
+
+The gcov tool version on the test machine must be compatible with the
+gcc version used for kernel build. Also the following files need to be
+copied from build to test machine:
+
+from the source tree:
+ - all C source files + headers
+
+from the build tree:
+ - all C source files + headers
+ - all .gcda and .gcno files
+ - all links to directories
+
+It is important to note that these files need to be placed into the
+exact same file system location on the test machine as on the build
+machine. If any of the path components is symbolic link, the actual
+directory needs to be used instead (due to make's CURDIR handling).
+
+b) gcov is run on the BUILD machine
+
+The following files need to be copied after each test case from test
+to build machine:
+
+from the gcov directory in sysfs:
+ - all .gcda files
+ - all links to .gcno files
+
+These files can be copied to any location on the build machine. gcov
+must then be called with the -o option pointing to that directory.
+
+Example directory setup on the build machine:
+
+ /tmp/linux: kernel source tree
+ /tmp/out: kernel build directory as specified by make O=
+ /tmp/coverage: location of the files copied from the test machine
+
+ [user@build] cd /tmp/out
+ [user@build] gcov -o /tmp/coverage/tmp/out/init main.c
+
+
+7. Troubleshooting
+==================
+
+Problem: Compilation aborts during linker step.
+Cause: Profiling flags are specified for source files which are not
+ linked to the main kernel or which are linked by a custom
+ linker procedure.
+Solution: Exclude affected source files from profiling by specifying
+ GCOV_PROFILE := n or GCOV_PROFILE_basename.o := n in the
+ corresponding Makefile.
+
+Problem: Files copied from sysfs appear empty or incomplete.
+Cause: Due to the way seq_file works, some tools such as cp or tar
+ may not correctly copy files from sysfs.
+Solution: Use 'cat' to read .gcda files and 'cp -d' to copy links.
+ Alternatively use the mechanism shown in Appendix B.
+
+
+Appendix A: gather_on_build.sh
+==============================
+
+Sample script to gather coverage meta files on the build machine
+(see 6a):
+#!/bin/bash
+
+KSRC=$1
+KOBJ=$2
+DEST=$3
+
+if [ -z "$KSRC" ] || [ -z "$KOBJ" ] || [ -z "$DEST" ]; then
+ echo "Usage: $0 <ksrc directory> <kobj directory> <output.tar.gz>" >&2
+ exit 1
+fi
+
+KSRC=$(cd $KSRC; printf "all:\n\t@echo \${CURDIR}\n" | make -f -)
+KOBJ=$(cd $KOBJ; printf "all:\n\t@echo \${CURDIR}\n" | make -f -)
+
+find $KSRC $KOBJ \( -name '*.gcno' -o -name '*.[ch]' -o -type l \) -a \
+ -perm /u+r,g+r | tar cfz $DEST -P -T -
+
+if [ $? -eq 0 ] ; then
+ echo "$DEST successfully created, copy to test system and unpack with:"
+ echo " tar xfz $DEST -P"
+else
+ echo "Could not create file $DEST"
+fi
+
+
+Appendix B: gather_on_test.sh
+=============================
+
+Sample script to gather coverage data files on the test machine
+(see 6b):
+
+#!/bin/bash -e
+
+DEST=$1
+GCDA=/sys/kernel/debug/gcov
+
+if [ -z "$DEST" ] ; then
+ echo "Usage: $0 <output.tar.gz>" >&2
+ exit 1
+fi
+
+TEMPDIR=$(mktemp -d)
+echo Collecting data..
+find $GCDA -type d -exec mkdir -p $TEMPDIR/\{\} \;
+find $GCDA -name '*.gcda' -exec sh -c 'cat < $0 > '$TEMPDIR'/$0' {} \;
+find $GCDA -name '*.gcno' -exec sh -c 'cp -d $0 '$TEMPDIR'/$0' {} \;
+tar czf $DEST -C $TEMPDIR sys
+rm -rf $TEMPDIR
+
+echo "$DEST successfully created, copy to build system and unpack with:"
+echo " tar xfz $DEST"
diff --git a/Documentation/gpio.txt b/Documentation/gpio.txt
index 145c25a170c..e4b6985044a 100644
--- a/Documentation/gpio.txt
+++ b/Documentation/gpio.txt
@@ -458,7 +458,7 @@ debugfs interface, since it provides control over GPIO direction and
value instead of just showing a gpio state summary. Plus, it could be
present on production systems without debugging support.
-Given approprate hardware documentation for the system, userspace could
+Given appropriate hardware documentation for the system, userspace could
know for example that GPIO #23 controls the write protect line used to
protect boot loader segments in flash memory. System upgrade procedures
may need to temporarily remove that protection, first importing a GPIO,
diff --git a/Documentation/hwmon/f71882fg b/Documentation/hwmon/f71882fg
index a8321267b5b..bee4c30bc1e 100644
--- a/Documentation/hwmon/f71882fg
+++ b/Documentation/hwmon/f71882fg
@@ -2,14 +2,18 @@ Kernel driver f71882fg
======================
Supported chips:
- * Fintek F71882FG and F71883FG
- Prefix: 'f71882fg'
+ * Fintek F71858FG
+ Prefix: 'f71858fg'
Addresses scanned: none, address read from Super I/O config space
Datasheet: Available from the Fintek website
* Fintek F71862FG and F71863FG
Prefix: 'f71862fg'
Addresses scanned: none, address read from Super I/O config space
Datasheet: Available from the Fintek website
+ * Fintek F71882FG and F71883FG
+ Prefix: 'f71882fg'
+ Addresses scanned: none, address read from Super I/O config space
+ Datasheet: Available from the Fintek website
* Fintek F8000
Prefix: 'f8000'
Addresses scanned: none, address read from Super I/O config space
@@ -66,13 +70,13 @@ printed when loading the driver.
Three different fan control modes are supported; the mode number is written
to the pwm#_enable file. Note that not all modes are supported on all
-chips, and some modes may only be available in RPM / PWM mode on the F8000.
+chips, and some modes may only be available in RPM / PWM mode.
Writing an unsupported mode will result in an invalid parameter error.
* 1: Manual mode
You ask for a specific PWM duty cycle / DC voltage or a specific % of
fan#_full_speed by writing to the pwm# file. This mode is only
- available on the F8000 if the fan channel is in RPM mode.
+ available on the F71858FG / F8000 if the fan channel is in RPM mode.
* 2: Normal auto mode
You can define a number of temperature/fan speed trip points, which % the
diff --git a/Documentation/hwmon/ibmaem b/Documentation/hwmon/ibmaem
index e98bdfea346..1e0d59e000b 100644
--- a/Documentation/hwmon/ibmaem
+++ b/Documentation/hwmon/ibmaem
@@ -7,7 +7,7 @@ henceforth as AEM.
Supported systems:
* Any recent IBM System X server with AEM support.
This includes the x3350, x3550, x3650, x3655, x3755, x3850 M2,
- x3950 M2, and certain HS2x/LS2x/QS2x blades. The IPMI host interface
+ x3950 M2, and certain HC10/HS2x/LS2x/QS2x blades. The IPMI host interface
driver ("ipmi-si") needs to be loaded for this driver to do anything.
Prefix: 'ibmaem'
Datasheet: Not available
diff --git a/Documentation/hwmon/sysfs-interface b/Documentation/hwmon/sysfs-interface
index 004ee161721..dcbd502c879 100644
--- a/Documentation/hwmon/sysfs-interface
+++ b/Documentation/hwmon/sysfs-interface
@@ -70,6 +70,7 @@ are interpreted as 0! For more on how written strings are interpreted see the
[0-*] denotes any positive number starting from 0
[1-*] denotes any positive number starting from 1
RO read only value
+WO write only value
RW read/write value
Read/write values may be read-only for some chips, depending on the
@@ -295,6 +296,24 @@ temp[1-*]_label Suggested temperature channel label.
user-space.
RO
+temp[1-*]_lowest
+ Historical minimum temperature
+ Unit: millidegree Celsius
+ RO
+
+temp[1-*]_highest
+ Historical maximum temperature
+ Unit: millidegree Celsius
+ RO
+
+temp[1-*]_reset_history
+ Reset temp_lowest and temp_highest
+ WO
+
+temp_reset_history
+ Reset temp_lowest and temp_highest for all sensors
+ WO
+
Some chips measure temperature using external thermistors and an ADC, and
report the temperature measurement as a voltage. Converting this voltage
back to a temperature (or the other way around for limits) requires
diff --git a/Documentation/hwmon/tmp401 b/Documentation/hwmon/tmp401
new file mode 100644
index 00000000000..9fc44724921
--- /dev/null
+++ b/Documentation/hwmon/tmp401
@@ -0,0 +1,42 @@
+Kernel driver tmp401
+====================
+
+Supported chips:
+ * Texas Instruments TMP401
+ Prefix: 'tmp401'
+ Addresses scanned: I2C 0x4c
+ Datasheet: http://focus.ti.com/docs/prod/folders/print/tmp401.html
+ * Texas Instruments TMP411
+ Prefix: 'tmp411'
+ Addresses scanned: I2C 0x4c
+ Datasheet: http://focus.ti.com/docs/prod/folders/print/tmp411.html
+
+Authors:
+ Hans de Goede <hdegoede@redhat.com>
+ Andre Prendel <andre.prendel@gmx.de>
+
+Description
+-----------
+
+This driver implements support for Texas Instruments TMP401 and
+TMP411 chips. These chips implements one remote and one local
+temperature sensor. Temperature is measured in degrees
+Celsius. Resolution of the remote sensor is 0.0625 degree. Local
+sensor resolution can be set to 0.5, 0.25, 0.125 or 0.0625 degree (not
+supported by the driver so far, so using the default resolution of 0.5
+degree).
+
+The driver provides the common sysfs-interface for temperatures (see
+/Documentation/hwmon/sysfs-interface under Temperatures).
+
+The TMP411 chip is compatible with TMP401. It provides some additional
+features.
+
+* Minimum and Maximum temperature measured since power-on, chip-reset
+
+ Exported via sysfs attributes tempX_lowest and tempX_highest.
+
+* Reset of historical minimum/maximum temperature measurements
+
+ Exported via sysfs attribute temp_reset_history. Writing 1 to this
+ file triggers a reset.
diff --git a/Documentation/hwmon/w83627ehf b/Documentation/hwmon/w83627ehf
index b6eb59384bb..02b74899eda 100644
--- a/Documentation/hwmon/w83627ehf
+++ b/Documentation/hwmon/w83627ehf
@@ -12,6 +12,10 @@ Supported chips:
Addresses scanned: ISA address retrieved from Super I/O registers
Datasheet:
http://www.nuvoton.com.tw/NR/rdonlyres/7885623D-A487-4CF9-A47F-30C5F73D6FE6/0/W83627DHG.pdf
+ * Winbond W83627DHG-P
+ Prefix: 'w83627dhg'
+ Addresses scanned: ISA address retrieved from Super I/O registers
+ Datasheet: not available
* Winbond W83667HG
Prefix: 'w83667hg'
Addresses scanned: ISA address retrieved from Super I/O registers
@@ -28,8 +32,8 @@ Description
-----------
This driver implements support for the Winbond W83627EHF, W83627EHG,
-W83627DHG and W83667HG super I/O chips. We will refer to them collectively
-as Winbond chips.
+W83627DHG, W83627DHG-P and W83667HG super I/O chips. We will refer to them
+collectively as Winbond chips.
The chips implement three temperature sensors, five fan rotation
speed sensors, ten analog voltage sensors (only nine for the 627DHG), one
@@ -135,3 +139,6 @@ done in the driver for all register addresses.
The DHG also supports PECI, where the DHG queries Intel CPU temperatures, and
the ICH8 southbridge gets that data via PECI from the DHG, so that the
southbridge drives the fans. And the DHG supports SST, a one-wire serial bus.
+
+The DHG-P has an additional automatic fan speed control mode named Smart Fan
+(TM) III+. This mode is not yet supported by the driver.
diff --git a/Documentation/i2c/busses/i2c-ocores b/Documentation/i2c/busses/i2c-ocores
index cfcebb10d14..c269aaa2f26 100644
--- a/Documentation/i2c/busses/i2c-ocores
+++ b/Documentation/i2c/busses/i2c-ocores
@@ -20,6 +20,8 @@ platform_device with the base address and interrupt number. The
dev.platform_data of the device should also point to a struct
ocores_i2c_platform_data (see linux/i2c-ocores.h) describing the
distance between registers and the input clock speed.
+There is also a possibility to attach a list of i2c_board_info which
+the i2c-ocores driver will add to the bus upon creation.
E.G. something like:
@@ -36,9 +38,24 @@ static struct resource ocores_resources[] = {
},
};
+/* optional board info */
+struct i2c_board_info ocores_i2c_board_info[] = {
+ {
+ I2C_BOARD_INFO("tsc2003", 0x48),
+ .platform_data = &tsc2003_platform_data,
+ .irq = TSC_IRQ
+ },
+ {
+ I2C_BOARD_INFO("adv7180", 0x42 >> 1),
+ .irq = ADV_IRQ
+ }
+};
+
static struct ocores_i2c_platform_data myi2c_data = {
.regstep = 2, /* two bytes between registers */
.clock_khz = 50000, /* input clock of 50MHz */
+ .devices = ocores_i2c_board_info, /* optional table of devices */
+ .num_devices = ARRAY_SIZE(ocores_i2c_board_info), /* table size */
};
static struct platform_device myi2c = {
diff --git a/Documentation/i2c/busses/i2c-viapro b/Documentation/i2c/busses/i2c-viapro
index 22efedf60c8..2e758b0e945 100644
--- a/Documentation/i2c/busses/i2c-viapro
+++ b/Documentation/i2c/busses/i2c-viapro
@@ -19,6 +19,9 @@ Supported adapters:
* VIA Technologies, Inc. VX800/VX820
Datasheet: available on http://linux.via.com.tw
+ * VIA Technologies, Inc. VX855/VX875
+ Datasheet: Availability unknown
+
Authors:
Kyösti Mälkki <kmalkki@cc.hut.fi>,
Mark D. Studebaker <mdsxyz123@yahoo.com>,
@@ -53,6 +56,7 @@ Your lspci -n listing must show one of these :
device 1106:3287 (VT8251)
device 1106:8324 (CX700)
device 1106:8353 (VX800/VX820)
+ device 1106:8409 (VX855/VX875)
If none of these show up, you should look in the BIOS for settings like
enable ACPI / SMBus or even USB.
diff --git a/Documentation/i2c/instantiating-devices b/Documentation/i2c/instantiating-devices
index b55ce57a84d..c740b7b4108 100644
--- a/Documentation/i2c/instantiating-devices
+++ b/Documentation/i2c/instantiating-devices
@@ -165,3 +165,47 @@ was done there. Two significant differences are:
Once again, method 3 should be avoided wherever possible. Explicit device
instantiation (methods 1 and 2) is much preferred for it is safer and
faster.
+
+
+Method 4: Instantiate from user-space
+-------------------------------------
+
+In general, the kernel should know which I2C devices are connected and
+what addresses they live at. However, in certain cases, it does not, so a
+sysfs interface was added to let the user provide the information. This
+interface is made of 2 attribute files which are created in every I2C bus
+directory: new_device and delete_device. Both files are write only and you
+must write the right parameters to them in order to properly instantiate,
+respectively delete, an I2C device.
+
+File new_device takes 2 parameters: the name of the I2C device (a string)
+and the address of the I2C device (a number, typically expressed in
+hexadecimal starting with 0x, but can also be expressed in decimal.)
+
+File delete_device takes a single parameter: the address of the I2C
+device. As no two devices can live at the same address on a given I2C
+segment, the address is sufficient to uniquely identify the device to be
+deleted.
+
+Example:
+# echo eeprom 0x50 > /sys/class/i2c-adapter/i2c-3/new_device
+
+While this interface should only be used when in-kernel device declaration
+can't be done, there is a variety of cases where it can be helpful:
+* The I2C driver usually detects devices (method 3 above) but the bus
+ segment your device lives on doesn't have the proper class bit set and
+ thus detection doesn't trigger.
+* The I2C driver usually detects devices, but your device lives at an
+ unexpected address.
+* The I2C driver usually detects devices, but your device is not detected,
+ either because the detection routine is too strict, or because your
+ device is not officially supported yet but you know it is compatible.
+* You are developing a driver on a test board, where you soldered the I2C
+ device yourself.
+
+This interface is a replacement for the force_* module parameters some I2C
+drivers implement. Being implemented in i2c-core rather than in each
+device driver individually, it is much more efficient, and also has the
+advantage that you do not have to reload the driver to change a setting.
+You can also instantiate the device before the driver is loaded or even
+available, and you don't need to know what driver the device needs.
diff --git a/Documentation/i2c/writing-clients b/Documentation/i2c/writing-clients
index c1a06f989cf..7860aafb483 100644
--- a/Documentation/i2c/writing-clients
+++ b/Documentation/i2c/writing-clients
@@ -126,19 +126,9 @@ different) configuration information, as do drivers handling chip variants
that can't be distinguished by protocol probing, or which need some board
specific information to operate correctly.
-Accordingly, the I2C stack now has two models for associating I2C devices
-with their drivers: the original "legacy" model, and a newer one that's
-fully compatible with the Linux 2.6 driver model. These models do not mix,
-since the "legacy" model requires drivers to create "i2c_client" device
-objects after SMBus style probing, while the Linux driver model expects
-drivers to be given such device objects in their probe() routines.
-The legacy model is deprecated now and will soon be removed, so we no
-longer document it here.
-
-
-Standard Driver Model Binding ("New Style")
--------------------------------------------
+Device/Driver Binding
+---------------------
System infrastructure, typically board-specific initialization code or
boot firmware, reports what I2C devices exist. For example, there may be
@@ -201,7 +191,7 @@ a given I2C bus. This is for example the case of hardware monitoring
devices on a PC's SMBus. In that case, you may want to let your driver
detect supported devices automatically. This is how the legacy model
was working, and is now available as an extension to the standard
-driver model (so that we can finally get rid of the legacy model.)
+driver model.
You simply have to define a detect callback which will attempt to
identify supported devices (returning 0 for supported ones and -ENODEV
diff --git a/Documentation/ide/ide.txt b/Documentation/ide/ide.txt
index 0c78f4b1d9d..e77bebfa7b0 100644
--- a/Documentation/ide/ide.txt
+++ b/Documentation/ide/ide.txt
@@ -216,6 +216,8 @@ Other kernel parameters for ide_core are:
* "noflush=[interface_number.device_number]" to disable flush requests
+* "nohpa=[interface_number.device_number]" to disable Host Protected Area
+
* "noprobe=[interface_number.device_number]" to skip probing
* "nowerr=[interface_number.device_number]" to ignore the WRERR_STAT bit
diff --git a/Documentation/input/sentelic.txt b/Documentation/input/sentelic.txt
new file mode 100644
index 00000000000..f7160a2fb6a
--- /dev/null
+++ b/Documentation/input/sentelic.txt
@@ -0,0 +1,475 @@
+Copyright (C) 2002-2008 Sentelic Corporation.
+Last update: Oct-31-2008
+
+==============================================================================
+* Finger Sensing Pad Intellimouse Mode(scrolling wheel, 4th and 5th buttons)
+==============================================================================
+A) MSID 4: Scrolling wheel mode plus Forward page(4th button) and Backward
+ page (5th button)
+@1. Set sample rate to 200;
+@2. Set sample rate to 200;
+@3. Set sample rate to 80;
+@4. Issuing the "Get device ID" command (0xF2) and waits for the response;
+@5. FSP will respond 0x04.
+
+Packet 1
+ Bit 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0
+BYTE |---------------|BYTE |---------------|BYTE|---------------|BYTE|---------------|
+ 1 |Y|X|y|x|1|M|R|L| 2 |X|X|X|X|X|X|X|X| 3 |Y|Y|Y|Y|Y|Y|Y|Y| 4 | | |B|F|W|W|W|W|
+ |---------------| |---------------| |---------------| |---------------|
+
+Byte 1: Bit7 => Y overflow
+ Bit6 => X overflow
+ Bit5 => Y sign bit
+ Bit4 => X sign bit
+ Bit3 => 1
+ Bit2 => Middle Button, 1 is pressed, 0 is not pressed.
+ Bit1 => Right Button, 1 is pressed, 0 is not pressed.
+ Bit0 => Left Button, 1 is pressed, 0 is not pressed.
+Byte 2: X Movement(9-bit 2's complement integers)
+Byte 3: Y Movement(9-bit 2's complement integers)
+Byte 4: Bit3~Bit0 => the scrolling wheel's movement since the last data report.
+ valid values, -8 ~ +7
+ Bit4 => 1 = 4th mouse button is pressed, Forward one page.
+ 0 = 4th mouse button is not pressed.
+ Bit5 => 1 = 5th mouse button is pressed, Backward one page.
+ 0 = 5th mouse button is not pressed.
+
+B) MSID 6: Horizontal and Vertical scrolling.
+@ Set bit 1 in register 0x40 to 1
+
+# FSP replaces scrolling wheel's movement as 4 bits to show horizontal and
+ vertical scrolling.
+
+Packet 1
+ Bit 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0
+BYTE |---------------|BYTE |---------------|BYTE|---------------|BYTE|---------------|
+ 1 |Y|X|y|x|1|M|R|L| 2 |X|X|X|X|X|X|X|X| 3 |Y|Y|Y|Y|Y|Y|Y|Y| 4 | | |B|F|l|r|u|d|
+ |---------------| |---------------| |---------------| |---------------|
+
+Byte 1: Bit7 => Y overflow
+ Bit6 => X overflow
+ Bit5 => Y sign bit
+ Bit4 => X sign bit
+ Bit3 => 1
+ Bit2 => Middle Button, 1 is pressed, 0 is not pressed.
+ Bit1 => Right Button, 1 is pressed, 0 is not pressed.
+ Bit0 => Left Button, 1 is pressed, 0 is not pressed.
+Byte 2: X Movement(9-bit 2's complement integers)
+Byte 3: Y Movement(9-bit 2's complement integers)
+Byte 4: Bit0 => the Vertical scrolling movement downward.
+ Bit1 => the Vertical scrolling movement upward.
+ Bit2 => the Vertical scrolling movement rightward.
+ Bit3 => the Vertical scrolling movement leftward.
+ Bit4 => 1 = 4th mouse button is pressed, Forward one page.
+ 0 = 4th mouse button is not pressed.
+ Bit5 => 1 = 5th mouse button is pressed, Backward one page.
+ 0 = 5th mouse button is not pressed.
+
+C) MSID 7:
+# FSP uses 2 packets(8 Bytes) data to represent Absolute Position
+ so we have PACKET NUMBER to identify packets.
+ If PACKET NUMBER is 0, the packet is Packet 1.
+ If PACKET NUMBER is 1, the packet is Packet 2.
+ Please count this number in program.
+
+# MSID6 special packet will be enable at the same time when enable MSID 7.
+
+==============================================================================
+* Absolute position for STL3886-G0.
+==============================================================================
+@ Set bit 2 or 3 in register 0x40 to 1
+@ Set bit 6 in register 0x40 to 1
+
+Packet 1 (ABSOLUTE POSITION)
+ Bit 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0
+BYTE |---------------|BYTE |---------------|BYTE|---------------|BYTE|---------------|
+ 1 |0|1|V|1|1|M|R|L| 2 |X|X|X|X|X|X|X|X| 3 |Y|Y|Y|Y|Y|Y|Y|Y| 4 |r|l|d|u|X|X|Y|Y|
+ |---------------| |---------------| |---------------| |---------------|
+
+Byte 1: Bit7~Bit6 => 00, Normal data packet
+ => 01, Absolute coordination packet
+ => 10, Notify packet
+ Bit5 => valid bit
+ Bit4 => 1
+ Bit3 => 1
+ Bit2 => Middle Button, 1 is pressed, 0 is not pressed.
+ Bit1 => Right Button, 1 is pressed, 0 is not pressed.
+ Bit0 => Left Button, 1 is pressed, 0 is not pressed.
+Byte 2: X coordinate (xpos[9:2])
+Byte 3: Y coordinate (ypos[9:2])
+Byte 4: Bit1~Bit0 => Y coordinate (xpos[1:0])
+ Bit3~Bit2 => X coordinate (ypos[1:0])
+ Bit4 => scroll up
+ Bit5 => scroll down
+ Bit6 => scroll left
+ Bit7 => scroll right
+
+Notify Packet for G0
+ Bit 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0
+BYTE |---------------|BYTE |---------------|BYTE|---------------|BYTE|---------------|
+ 1 |1|0|0|1|1|M|R|L| 2 |C|C|C|C|C|C|C|C| 3 |M|M|M|M|M|M|M|M| 4 |0|0|0|0|0|0|0|0|
+ |---------------| |---------------| |---------------| |---------------|
+
+Byte 1: Bit7~Bit6 => 00, Normal data packet
+ => 01, Absolute coordination packet
+ => 10, Notify packet
+ Bit5 => 0
+ Bit4 => 1
+ Bit3 => 1
+ Bit2 => Middle Button, 1 is pressed, 0 is not pressed.
+ Bit1 => Right Button, 1 is pressed, 0 is not pressed.
+ Bit0 => Left Button, 1 is pressed, 0 is not pressed.
+Byte 2: Message Type => 0x5A (Enable/Disable status packet)
+ Mode Type => 0xA5 (Normal/Icon mode status)
+Byte 3: Message Type => 0x00 (Disabled)
+ => 0x01 (Enabled)
+ Mode Type => 0x00 (Normal)
+ => 0x01 (Icon)
+Byte 4: Bit7~Bit0 => Don't Care
+
+==============================================================================
+* Absolute position for STL3888-A0.
+==============================================================================
+Packet 1 (ABSOLUTE POSITION)
+ Bit 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0
+BYTE |---------------|BYTE |---------------|BYTE|---------------|BYTE|---------------|
+ 1 |0|1|V|A|1|L|0|1| 2 |X|X|X|X|X|X|X|X| 3 |Y|Y|Y|Y|Y|Y|Y|Y| 4 |x|x|y|y|X|X|Y|Y|
+ |---------------| |---------------| |---------------| |---------------|
+
+Byte 1: Bit7~Bit6 => 00, Normal data packet
+ => 01, Absolute coordination packet
+ => 10, Notify packet
+ Bit5 => Valid bit, 0 means that the coordinate is invalid or finger up.
+ When both fingers are up, the last two reports have zero valid
+ bit.
+ Bit4 => arc
+ Bit3 => 1
+ Bit2 => Left Button, 1 is pressed, 0 is released.
+ Bit1 => 0
+ Bit0 => 1
+Byte 2: X coordinate (xpos[9:2])
+Byte 3: Y coordinate (ypos[9:2])
+Byte 4: Bit1~Bit0 => Y coordinate (xpos[1:0])
+ Bit3~Bit2 => X coordinate (ypos[1:0])
+ Bit5~Bit4 => y1_g
+ Bit7~Bit6 => x1_g
+
+Packet 2 (ABSOLUTE POSITION)
+ Bit 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0
+BYTE |---------------|BYTE |---------------|BYTE|---------------|BYTE|---------------|
+ 1 |0|1|V|A|1|R|1|0| 2 |X|X|X|X|X|X|X|X| 3 |Y|Y|Y|Y|Y|Y|Y|Y| 4 |x|x|y|y|X|X|Y|Y|
+ |---------------| |---------------| |---------------| |---------------|
+
+Byte 1: Bit7~Bit6 => 00, Normal data packet
+ => 01, Absolute coordinates packet
+ => 10, Notify packet
+ Bit5 => Valid bit, 0 means that the coordinate is invalid or finger up.
+ When both fingers are up, the last two reports have zero valid
+ bit.
+ Bit4 => arc
+ Bit3 => 1
+ Bit2 => Right Button, 1 is pressed, 0 is released.
+ Bit1 => 1
+ Bit0 => 0
+Byte 2: X coordinate (xpos[9:2])
+Byte 3: Y coordinate (ypos[9:2])
+Byte 4: Bit1~Bit0 => Y coordinate (xpos[1:0])
+ Bit3~Bit2 => X coordinate (ypos[1:0])
+ Bit5~Bit4 => y2_g
+ Bit7~Bit6 => x2_g
+
+Notify Packet for STL3888-A0
+ Bit 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0
+BYTE |---------------|BYTE |---------------|BYTE|---------------|BYTE|---------------|
+ 1 |1|0|1|P|1|M|R|L| 2 |C|C|C|C|C|C|C|C| 3 |0|0|F|F|0|0|0|i| 4 |r|l|d|u|0|0|0|0|
+ |---------------| |---------------| |---------------| |---------------|
+
+Byte 1: Bit7~Bit6 => 00, Normal data packet
+ => 01, Absolute coordination packet
+ => 10, Notify packet
+ Bit5 => 1
+ Bit4 => when in absolute coordinates mode (valid when EN_PKT_GO is 1):
+ 0: left button is generated by the on-pad command
+ 1: left button is generated by the external button
+ Bit3 => 1
+ Bit2 => Middle Button, 1 is pressed, 0 is not pressed.
+ Bit1 => Right Button, 1 is pressed, 0 is not pressed.
+ Bit0 => Left Button, 1 is pressed, 0 is not pressed.
+Byte 2: Message Type => 0xB7 (Multi Finger, Multi Coordinate mode)
+Byte 3: Bit7~Bit6 => Don't care
+ Bit5~Bit4 => Number of fingers
+ Bit3~Bit1 => Reserved
+ Bit0 => 1: enter gesture mode; 0: leaving gesture mode
+Byte 4: Bit7 => scroll right button
+ Bit6 => scroll left button
+ Bit5 => scroll down button
+ Bit4 => scroll up button
+ * Note that if gesture and additional button (Bit4~Bit7)
+ happen at the same time, the button information will not
+ be sent.
+ Bit3~Bit0 => Reserved
+
+Sample sequence of Multi-finger, Multi-coordinate mode:
+
+ notify packet (valid bit == 1), abs pkt 1, abs pkt 2, abs pkt 1,
+ abs pkt 2, ..., notify packet(valid bit == 0)
+
+==============================================================================
+* FSP Enable/Disable packet
+==============================================================================
+ Bit 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0
+BYTE |---------------|BYTE |---------------|BYTE|---------------|BYTE|---------------|
+ 1 |Y|X|0|0|1|M|R|L| 2 |0|1|0|1|1|0|1|E| 3 | | | | | | | | | 4 | | | | | | | | |
+ |---------------| |---------------| |---------------| |---------------|
+
+FSP will send out enable/disable packet when FSP receive PS/2 enable/disable
+command. Host will receive the packet which Middle, Right, Left button will
+be set. The packet only use byte 0 and byte 1 as a pattern of original packet.
+Ignore the other bytes of the packet.
+
+Byte 1: Bit7 => 0, Y overflow
+ Bit6 => 0, X overflow
+ Bit5 => 0, Y sign bit
+ Bit4 => 0, X sign bit
+ Bit3 => 1
+ Bit2 => 1, Middle Button
+ Bit1 => 1, Right Button
+ Bit0 => 1, Left Button
+Byte 2: Bit7~1 => (0101101b)
+ Bit0 => 1 = Enable
+ 0 = Disable
+Byte 3: Don't care
+Byte 4: Don't care (MOUSE ID 3, 4)
+Byte 5~8: Don't care (Absolute packet)
+
+==============================================================================
+* PS/2 Command Set
+==============================================================================
+
+FSP supports basic PS/2 commanding set and modes, refer to following URL for
+details about PS/2 commands:
+
+http://www.computer-engineering.org/index.php?title=PS/2_Mouse_Interface
+
+==============================================================================
+* Programming Sequence for Determining Packet Parsing Flow
+==============================================================================
+1. Identify FSP by reading device ID(0x00) and version(0x01) register
+
+2. Determine number of buttons by reading status2 (0x0b) register
+
+ buttons = reg[0x0b] & 0x30
+
+ if buttons == 0x30 or buttons == 0x20:
+ # two/four buttons
+ Refer to 'Finger Sensing Pad PS/2 Mouse Intellimouse'
+ section A for packet parsing detail(ignore byte 4, bit ~ 7)
+ elif buttons == 0x10:
+ # 6 buttons
+ Refer to 'Finger Sensing Pad PS/2 Mouse Intellimouse'
+ section B for packet parsing detail
+ elif buttons == 0x00:
+ # 6 buttons
+ Refer to 'Finger Sensing Pad PS/2 Mouse Intellimouse'
+ section A for packet parsing detail
+
+==============================================================================
+* Programming Sequence for Register Reading/Writing
+==============================================================================
+
+Register inversion requirement:
+
+ Following values needed to be inverted(the '~' operator in C) before being
+sent to FSP:
+
+ 0xe9, 0xee, 0xf2 and 0xff.
+
+Register swapping requirement:
+
+ Following values needed to have their higher 4 bits and lower 4 bits being
+swapped before being sent to FSP:
+
+ 10, 20, 40, 60, 80, 100 and 200.
+
+Register reading sequence:
+
+ 1. send 0xf3 PS/2 command to FSP;
+
+ 2. send 0x66 PS/2 command to FSP;
+
+ 3. send 0x88 PS/2 command to FSP;
+
+ 4. send 0xf3 PS/2 command to FSP;
+
+ 5. if the register address being to read is not required to be
+ inverted(refer to the 'Register inversion requirement' section),
+ goto step 6
+
+ 5a. send 0x68 PS/2 command to FSP;
+
+ 5b. send the inverted register address to FSP and goto step 8;
+
+ 6. if the register address being to read is not required to be
+ swapped(refer to the 'Register swapping requirement' section),
+ goto step 7
+
+ 6a. send 0xcc PS/2 command to FSP;
+
+ 6b. send the swapped register address to FSP and goto step 8;
+
+ 7. send 0x66 PS/2 command to FSP;
+
+ 7a. send the original register address to FSP and goto step 8;
+
+ 8. send 0xe9(status request) PS/2 command to FSP;
+
+ 9. the response read from FSP should be the requested register value.
+
+Register writing sequence:
+
+ 1. send 0xf3 PS/2 command to FSP;
+
+ 2. if the register address being to write is not required to be
+ inverted(refer to the 'Register inversion requirement' section),
+ goto step 3
+
+ 2a. send 0x74 PS/2 command to FSP;
+
+ 2b. send the inverted register address to FSP and goto step 5;
+
+ 3. if the register address being to write is not required to be
+ swapped(refer to the 'Register swapping requirement' section),
+ goto step 4
+
+ 3a. send 0x77 PS/2 command to FSP;
+
+ 3b. send the swapped register address to FSP and goto step 5;
+
+ 4. send 0x55 PS/2 command to FSP;
+
+ 4a. send the register address to FSP and goto step 5;
+
+ 5. send 0xf3 PS/2 command to FSP;
+
+ 6. if the register value being to write is not required to be
+ inverted(refer to the 'Register inversion requirement' section),
+ goto step 7
+
+ 6a. send 0x47 PS/2 command to FSP;
+
+ 6b. send the inverted register value to FSP and goto step 9;
+
+ 7. if the register value being to write is not required to be
+ swapped(refer to the 'Register swapping requirement' section),
+ goto step 8
+
+ 7a. send 0x44 PS/2 command to FSP;
+
+ 7b. send the swapped register value to FSP and goto step 9;
+
+ 8. send 0x33 PS/2 command to FSP;
+
+ 8a. send the register value to FSP;
+
+ 9. the register writing sequence is completed.
+
+==============================================================================
+* Register Listing
+==============================================================================
+
+offset width default r/w name
+0x00 bit7~bit0 0x01 RO device ID
+
+0x01 bit7~bit0 0xc0 RW version ID
+
+0x02 bit7~bit0 0x01 RO vendor ID
+
+0x03 bit7~bit0 0x01 RO product ID
+
+0x04 bit3~bit0 0x01 RW revision ID
+
+0x0b RO test mode status 1
+ bit3 1 RO 0: rotate 180 degree, 1: no rotation
+
+ bit5~bit4 RO number of buttons
+ 11 => 2, lbtn/rbtn
+ 10 => 4, lbtn/rbtn/scru/scrd
+ 01 => 6, lbtn/rbtn/scru/scrd/scrl/scrr
+ 00 => 6, lbtn/rbtn/scru/scrd/fbtn/bbtn
+
+0x0f RW register file page control
+ bit0 0 RW 1 to enable page 1 register files
+
+0x10 RW system control 1
+ bit0 1 RW Reserved, must be 1
+ bit1 0 RW Reserved, must be 0
+ bit4 1 RW Reserved, must be 0
+ bit5 0 RW register clock gating enable
+ 0: read only, 1: read/write enable
+ (Note that following registers does not require clock gating being
+ enabled prior to write: 05 06 07 08 09 0c 0f 10 11 12 16 17 18 23 2e
+ 40 41 42 43.)
+
+0x31 RW on-pad command detection
+ bit7 0 RW on-pad command left button down tag
+ enable
+ 0: disable, 1: enable
+
+0x34 RW on-pad command control 5
+ bit4~bit0 0x05 RW XLO in 0s/4/1, so 03h = 0010.1b = 2.5
+ (Note that position unit is in 0.5 scanline)
+
+ bit7 0 RW on-pad tap zone enable
+ 0: disable, 1: enable
+
+0x35 RW on-pad command control 6
+ bit4~bit0 0x1d RW XHI in 0s/4/1, so 19h = 1100.1b = 12.5
+ (Note that position unit is in 0.5 scanline)
+
+0x36 RW on-pad command control 7
+ bit4~bit0 0x04 RW YLO in 0s/4/1, so 03h = 0010.1b = 2.5
+ (Note that position unit is in 0.5 scanline)
+
+0x37 RW on-pad command control 8
+ bit4~bit0 0x13 RW YHI in 0s/4/1, so 11h = 1000.1b = 8.5
+ (Note that position unit is in 0.5 scanline)
+
+0x40 RW system control 5
+ bit1 0 RW FSP Intellimouse mode enable
+ 0: disable, 1: enable
+
+ bit2 0 RW movement + abs. coordinate mode enable
+ 0: disable, 1: enable
+ (Note that this function has the functionality of bit 1 even when
+ bit 1 is not set. However, the format is different from that of bit 1.
+ In addition, when bit 1 and bit 2 are set at the same time, bit 2 will
+ override bit 1.)
+
+ bit3 0 RW abs. coordinate only mode enable
+ 0: disable, 1: enable
+ (Note that this function has the functionality of bit 1 even when
+ bit 1 is not set. However, the format is different from that of bit 1.
+ In addition, when bit 1, bit 2 and bit 3 are set at the same time,
+ bit 3 will override bit 1 and 2.)
+
+ bit5 0 RW auto switch enable
+ 0: disable, 1: enable
+
+ bit6 0 RW G0 abs. + notify packet format enable
+ 0: disable, 1: enable
+ (Note that the absolute/relative coordinate output still depends on
+ bit 2 and 3. That is, if any of those bit is 1, host will receive
+ absolute coordinates; otherwise, host only receives packets with
+ relative coordinate.)
+
+0x43 RW on-pad control
+ bit0 0 RW on-pad control enable
+ 0: disable, 1: enable
+ (Note that if this bit is cleared, bit 3/5 will be ineffective)
+
+ bit3 0 RW on-pad fix vertical scrolling enable
+ 0: disable, 1: enable
+
+ bit5 0 RW on-pad fix horizontal scrolling enable
+ 0: disable, 1: enable
diff --git a/Documentation/ioctl/ioctl-number.txt b/Documentation/ioctl/ioctl-number.txt
index 1f779a25c70..dbea4f95fc8 100644
--- a/Documentation/ioctl/ioctl-number.txt
+++ b/Documentation/ioctl/ioctl-number.txt
@@ -139,6 +139,7 @@ Code Seq# Include File Comments
'm' all linux/synclink.h conflict!
'm' 00-1F net/irda/irmod.h conflict!
'n' 00-7F linux/ncp_fs.h
+'n' 80-8F linux/nilfs2_fs.h NILFS2
'n' E0-FF video/matrox.h matroxfb
'o' 00-1F fs/ocfs2/ocfs2_fs.h OCFS2
'o' 00-03 include/mtd/ubi-user.h conflict! (OCFS2 and UBI overlaps)
@@ -149,6 +150,8 @@ Code Seq# Include File Comments
'p' 40-7F linux/nvram.h
'p' 80-9F user-space parport
<mailto:tim@cyberelk.net>
+'p' a1-a4 linux/pps.h LinuxPPS
+ <mailto:giometti@linux.it>
'q' 00-1F linux/serio.h
'q' 80-FF Internet PhoneJACK, Internet LineJACK
<http://www.quicknet.net>
diff --git a/Documentation/isdn/00-INDEX b/Documentation/isdn/00-INDEX
index 5a2d69989a8..e87e336f590 100644
--- a/Documentation/isdn/00-INDEX
+++ b/Documentation/isdn/00-INDEX
@@ -14,39 +14,37 @@ README
- general info on what you need and what to do for Linux ISDN.
README.FAQ
- general info for FAQ.
+README.HiSax
+ - info on the HiSax driver which replaces the old teles.
+README.act2000
+ - info on driver for IBM ACT-2000 card.
README.audio
- info for running audio over ISDN.
+README.avmb1
+ - info on driver for AVM-B1 ISDN card.
+README.concap
+ - info on "CONCAP" encapsulation protocol interface used for X.25.
+README.diversion
+ - info on module for isdn diversion services.
README.fax
- info for using Fax over ISDN.
README.gigaset
- - info on the drivers for Siemens Gigaset ISDN adapters.
-README.icn
- - info on the ICN-ISDN-card and its driver.
-README.HiSax
- - info on the HiSax driver which replaces the old teles.
+ - info on the drivers for Siemens Gigaset ISDN adapters
README.hfc-pci
- info on hfc-pci based cards.
+README.hysdn
+ - info on driver for Hypercope active HYSDN cards
+README.icn
+ - info on the ICN-ISDN-card and its driver.
+README.mISDN
+ - info on the Modular ISDN subsystem (mISDN)
README.pcbit
- info on the PCBIT-D ISDN adapter and driver.
-README.syncppp
- - info on running Sync PPP over ISDN.
-syncPPP.FAQ
- - frequently asked questions about running PPP over ISDN.
-README.avmb1
- - info on driver for AVM-B1 ISDN card.
-README.act2000
- - info on driver for IBM ACT-2000 card.
-README.eicon
- - info on driver for Eicon active cards.
-README.concap
- - info on "CONCAP" encapsulation protocol interface used for X.25.
-README.diversion
- - info on module for isdn diversion services.
README.sc
- info on driver for Spellcaster cards.
+README.syncppp
+ - info on running Sync PPP over ISDN.
README.x25
- info for running X.25 over ISDN.
-README.hysdn
- - info on driver for Hypercope active HYSDN cards
-README.mISDN
- - info on the Modular ISDN subsystem (mISDN).
+syncPPP.FAQ
+ - frequently asked questions about running PPP over ISDN.
diff --git a/Documentation/isdn/INTERFACE.CAPI b/Documentation/isdn/INTERFACE.CAPI
index 786d619b36e..686e107923e 100644
--- a/Documentation/isdn/INTERFACE.CAPI
+++ b/Documentation/isdn/INTERFACE.CAPI
@@ -45,7 +45,7 @@ From then on, Kernel CAPI may call the registered callback functions for the
device.
If the device becomes unusable for any reason (shutdown, disconnect ...), the
-driver has to call capi_ctr_reseted(). This will prevent further calls to the
+driver has to call capi_ctr_down(). This will prevent further calls to the
callback functions by Kernel CAPI.
@@ -114,20 +114,36 @@ char *driver_name
int (*load_firmware)(struct capi_ctr *ctrlr, capiloaddata *ldata)
(optional) pointer to a callback function for sending firmware and
configuration data to the device
+ Return value: 0 on success, error code on error
+ Called in process context.
void (*reset_ctr)(struct capi_ctr *ctrlr)
- pointer to a callback function for performing a reset on the device,
- releasing all registered applications
+ (optional) pointer to a callback function for performing a reset on
+ the device, releasing all registered applications
+ Called in process context.
void (*register_appl)(struct capi_ctr *ctrlr, u16 applid,
capi_register_params *rparam)
void (*release_appl)(struct capi_ctr *ctrlr, u16 applid)
pointers to callback functions for registration and deregistration of
applications with the device
+ Calls to these functions are serialized by Kernel CAPI so that only
+ one call to any of them is active at any time.
u16 (*send_message)(struct capi_ctr *ctrlr, struct sk_buff *skb)
pointer to a callback function for sending a CAPI message to the
device
+ Return value: CAPI error code
+ If the method returns 0 (CAPI_NOERROR) the driver has taken ownership
+ of the skb and the caller may no longer access it. If it returns a
+ non-zero (error) value then ownership of the skb returns to the caller
+ who may reuse or free it.
+ The return value should only be used to signal problems with respect
+ to accepting or queueing the message. Errors occurring during the
+ actual processing of the message should be signaled with an
+ appropriate reply message.
+ Calls to this function are not serialized by Kernel CAPI, ie. it must
+ be prepared to be re-entered.
char *(*procinfo)(struct capi_ctr *ctrlr)
pointer to a callback function returning the entry for the device in
@@ -138,6 +154,8 @@ read_proc_t *ctr_read_proc
system entry, /proc/capi/controllers/<n>; will be called with a
pointer to the device's capi_ctr structure as the last (data) argument
+Note: Callback functions are never called in interrupt context.
+
- to be filled in before calling capi_ctr_ready():
u8 manu[CAPI_MANUFACTURER_LEN]
@@ -153,6 +171,45 @@ u8 serial[CAPI_SERIAL_LEN]
value to return for CAPI_GET_SERIAL
+4.3 The _cmsg Structure
+
+(declared in <linux/isdn/capiutil.h>)
+
+The _cmsg structure stores the contents of a CAPI 2.0 message in an easily
+accessible form. It contains members for all possible CAPI 2.0 parameters, of
+which only those appearing in the message type currently being processed are
+actually used. Unused members should be set to zero.
+
+Members are named after the CAPI 2.0 standard names of the parameters they
+represent. See <linux/isdn/capiutil.h> for the exact spelling. Member data
+types are:
+
+u8 for CAPI parameters of type 'byte'
+
+u16 for CAPI parameters of type 'word'
+
+u32 for CAPI parameters of type 'dword'
+
+_cstruct for CAPI parameters of type 'struct' not containing any
+ variably-sized (struct) subparameters (eg. 'Called Party Number')
+ The member is a pointer to a buffer containing the parameter in
+ CAPI encoding (length + content). It may also be NULL, which will
+ be taken to represent an empty (zero length) parameter.
+
+_cmstruct for CAPI parameters of type 'struct' containing 'struct'
+ subparameters ('Additional Info' and 'B Protocol')
+ The representation is a single byte containing one of the values:
+ CAPI_DEFAULT: the parameter is empty
+ CAPI_COMPOSE: the values of the subparameters are stored
+ individually in the corresponding _cmsg structure members
+
+Functions capi_cmsg2message() and capi_message2cmsg() are provided to convert
+messages between their transport encoding described in the CAPI 2.0 standard
+and their _cmsg structure representation. Note that capi_cmsg2message() does
+not know or check the size of its destination buffer. The caller must make
+sure it is big enough to accomodate the resulting CAPI message.
+
+
5. Lower Layer Interface Functions
(declared in <linux/isdn/capilli.h>)
@@ -166,7 +223,7 @@ int detach_capi_ctr(struct capi_ctr *ctrlr)
register/unregister a device (controller) with Kernel CAPI
void capi_ctr_ready(struct capi_ctr *ctrlr)
-void capi_ctr_reseted(struct capi_ctr *ctrlr)
+void capi_ctr_down(struct capi_ctr *ctrlr)
signal controller ready/not ready
void capi_ctr_suspend_output(struct capi_ctr *ctrlr)
@@ -211,3 +268,32 @@ CAPIMSG_CONTROL(m) CAPIMSG_SETCONTROL(m, contr) Controller/PLCI/NCCI
(u32)
CAPIMSG_DATALEN(m) CAPIMSG_SETDATALEN(m, len) Data Length (u16)
+
+Library functions for working with _cmsg structures
+(from <linux/isdn/capiutil.h>):
+
+unsigned capi_cmsg2message(_cmsg *cmsg, u8 *msg)
+ Assembles a CAPI 2.0 message from the parameters in *cmsg, storing the
+ result in *msg.
+
+unsigned capi_message2cmsg(_cmsg *cmsg, u8 *msg)
+ Disassembles the CAPI 2.0 message in *msg, storing the parameters in
+ *cmsg.
+
+unsigned capi_cmsg_header(_cmsg *cmsg, u16 ApplId, u8 Command, u8 Subcommand,
+ u16 Messagenumber, u32 Controller)
+ Fills the header part and address field of the _cmsg structure *cmsg
+ with the given values, zeroing the remainder of the structure so only
+ parameters with non-default values need to be changed before sending
+ the message.
+
+void capi_cmsg_answer(_cmsg *cmsg)
+ Sets the low bit of the Subcommand field in *cmsg, thereby converting
+ _REQ to _CONF and _IND to _RESP.
+
+char *capi_cmd2str(u8 Command, u8 Subcommand)
+ Returns the CAPI 2.0 message name corresponding to the given command
+ and subcommand values, as a static ASCII string. The return value may
+ be NULL if the command/subcommand is not one of those defined in the
+ CAPI 2.0 standard.
+
diff --git a/Documentation/isdn/README.gigaset b/Documentation/isdn/README.gigaset
index 02c0e9341dd..f9963103ae3 100644
--- a/Documentation/isdn/README.gigaset
+++ b/Documentation/isdn/README.gigaset
@@ -149,10 +149,8 @@ GigaSet 307x Device Driver
configuration files and chat scripts in the gigaset-VERSION/ppp directory
in the driver packages from http://sourceforge.net/projects/gigaset307x/.
Please note that the USB drivers are not able to change the state of the
- control lines (the M105 driver can be configured to use some undocumented
- control requests, if you really need the control lines, though). This means
- you must use "Stupid Mode" if you are using wvdial or you should use the
- nocrtscts option of pppd.
+ control lines. This means you must use "Stupid Mode" if you are using
+ wvdial or you should use the nocrtscts option of pppd.
You must also assure that the ppp_async module is loaded with the parameter
flag_time=0. You can do this e.g. by adding a line like
@@ -190,20 +188,19 @@ GigaSet 307x Device Driver
You can also use /sys/class/tty/ttyGxy/cidmode for changing the CID mode
setting (ttyGxy is ttyGU0 or ttyGB0).
-2.6. M105 Undocumented USB Requests
- ------------------------------
-
- The Gigaset M105 USB data box understands a couple of useful, but
- undocumented USB commands. These requests are not used in normal
- operation (for wireless access to the base), but are needed for access
- to the M105's own configuration mode (registration to the base, baudrate
- and line format settings, device status queries) via the gigacontr
- utility. Their use is controlled by the kernel configuration option
- "Support for undocumented USB requests" (CONFIG_GIGASET_UNDOCREQ). If you
- encounter error code -ENOTTY when trying to use some features of the
- M105, try setting that option to "y" via 'make {x,menu}config' and
- recompiling the driver.
-
+2.6. Unregistered Wireless Devices (M101/M105)
+ -----------------------------------------
+ The main purpose of the ser_gigaset and usb_gigaset drivers is to allow
+ the M101 and M105 wireless devices to be used as ISDN devices for ISDN
+ connections through a Gigaset base. Therefore they assume that the device
+ is registered to a DECT base.
+
+ If the M101/M105 device is not registered to a base, initialization of
+ the device fails, and a corresponding error message is logged by the
+ driver. In that situation, a restricted set of functions is available
+ which includes, in particular, those necessary for registering the device
+ to a base or for switching it between Fixed Part and Portable Part
+ modes.
3. Troubleshooting
---------------
@@ -234,11 +231,12 @@ GigaSet 307x Device Driver
Select Unimodem mode for all DECT data adapters. (see section 2.4.)
Problem:
- You want to configure your USB DECT data adapter (M105) but gigacontr
- reports an error: "/dev/ttyGU0: Inappropriate ioctl for device".
+ Messages like this:
+ usb_gigaset 3-2:1.0: Could not initialize the device.
+ appear in your syslog.
Solution:
- Recompile the usb_gigaset driver with the kernel configuration option
- CONFIG_GIGASET_UNDOCREQ set to 'y'. (see section 2.6.)
+ Check whether your M10x wireless device is correctly registered to the
+ Gigaset base. (see section 2.6.)
3.2. Telling the driver to provide more information
----------------------------------------------
diff --git a/Documentation/ja_JP/SubmitChecklist b/Documentation/ja_JP/SubmitChecklist
index 6c42e071d72..2df4576f117 100644
--- a/Documentation/ja_JP/SubmitChecklist
+++ b/Documentation/ja_JP/SubmitChecklist
@@ -75,7 +75,7 @@ Linux カーネルパッチ投稿者向けチェックリスト
ビルドした上、動作確認を行ってください。
14: もしパッチがディスクのI/O性能などに影響を与えるようであれば、
- 'CONFIG_LBD'オプションを有効にした場合と無効にした場合の両方で
+ 'CONFIG_LBDAF'オプションを有効にした場合と無効にした場合の両方で
テストを実施してみてください。
15: lockdepの機能を全て有効にした上で、全てのコードパスを評価してください。
diff --git a/Documentation/kbuild/kconfig.txt b/Documentation/kbuild/kconfig.txt
index 26a7c0a9319..849b5e56d06 100644
--- a/Documentation/kbuild/kconfig.txt
+++ b/Documentation/kbuild/kconfig.txt
@@ -35,48 +35,26 @@ new .config files to see the differences:
(Yes, we need something better here.)
-
-======================================================================
-menuconfig
---------------------------------------------------
-
-SEARCHING for CONFIG symbols
-
-Searching in menuconfig:
-
- The Search function searches for kernel configuration symbol
- names, so you have to know something close to what you are
- looking for.
-
- Example:
- /hotplug
- This lists all config symbols that contain "hotplug",
- e.g., HOTPLUG, HOTPLUG_CPU, MEMORY_HOTPLUG.
-
- For search help, enter / followed TAB-TAB-TAB (to highlight
- <Help>) and Enter. This will tell you that you can also use
- regular expressions (regexes) in the search string, so if you
- are not interested in MEMORY_HOTPLUG, you could try
-
- /^hotplug
-
-
______________________________________________________________________
-Color Themes for 'menuconfig'
+Environment variables for '*config'
-It is possible to select different color themes using the variable
-MENUCONFIG_COLOR. To select a theme use:
+KCONFIG_CONFIG
+--------------------------------------------------
+This environment variable can be used to specify a default kernel config
+file name to override the default name of ".config".
- make MENUCONFIG_COLOR=<theme> menuconfig
+KCONFIG_OVERWRITECONFIG
+--------------------------------------------------
+If you set KCONFIG_OVERWRITECONFIG in the environment, Kconfig will not
+break symlinks when .config is a symlink to somewhere else.
-Available themes are:
- mono => selects colors suitable for monochrome displays
- blackbg => selects a color scheme with black background
- classic => theme with blue background. The classic look
- bluetitle => a LCD friendly version of classic. (default)
+KCONFIG_NOTIMESTAMP
+--------------------------------------------------
+If this environment variable exists and is non-null, the timestamp line
+in generated .config files is omitted.
______________________________________________________________________
-Environment variables in 'menuconfig'
+Environment variables for '{allyes/allmod/allno/rand}config'
KCONFIG_ALLCONFIG
--------------------------------------------------
@@ -95,8 +73,7 @@ values.
This enables you to create "miniature" config (miniconfig) or custom
config files containing just the config symbols that you are interested
in. Then the kernel config system generates the full .config file,
-including dependencies of your miniconfig file, based on the miniconfig
-file.
+including symbols of your miniconfig file.
This 'KCONFIG_ALLCONFIG' file is a config file which contains
(usually a subset of all) preset config symbols. These variable
@@ -113,26 +90,14 @@ These examples will disable most options (allnoconfig) but enable or
disable the options that are explicitly listed in the specified
mini-config files.
+______________________________________________________________________
+Environment variables for 'silentoldconfig'
+
KCONFIG_NOSILENTUPDATE
--------------------------------------------------
If this variable has a non-blank value, it prevents silent kernel
config udpates (requires explicit updates).
-KCONFIG_CONFIG
---------------------------------------------------
-This environment variable can be used to specify a default kernel config
-file name to override the default name of ".config".
-
-KCONFIG_OVERWRITECONFIG
---------------------------------------------------
-If you set KCONFIG_OVERWRITECONFIG in the environment, Kconfig will not
-break symlinks when .config is a symlink to somewhere else.
-
-KCONFIG_NOTIMESTAMP
---------------------------------------------------
-If this environment variable exists and is non-null, the timestamp line
-in generated .config files is omitted.
-
KCONFIG_AUTOCONFIG
--------------------------------------------------
This environment variable can be set to specify the path & name of the
@@ -143,15 +108,54 @@ KCONFIG_AUTOHEADER
This environment variable can be set to specify the path & name of the
"autoconf.h" (header) file. Its default value is "include/linux/autoconf.h".
+
+======================================================================
+menuconfig
+--------------------------------------------------
+
+SEARCHING for CONFIG symbols
+
+Searching in menuconfig:
+
+ The Search function searches for kernel configuration symbol
+ names, so you have to know something close to what you are
+ looking for.
+
+ Example:
+ /hotplug
+ This lists all config symbols that contain "hotplug",
+ e.g., HOTPLUG, HOTPLUG_CPU, MEMORY_HOTPLUG.
+
+ For search help, enter / followed TAB-TAB-TAB (to highlight
+ <Help>) and Enter. This will tell you that you can also use
+ regular expressions (regexes) in the search string, so if you
+ are not interested in MEMORY_HOTPLUG, you could try
+
+ /^hotplug
+
______________________________________________________________________
-menuconfig User Interface Options
-----------------------------------------------------------------------
+User interface options for 'menuconfig'
+
+MENUCONFIG_COLOR
+--------------------------------------------------
+It is possible to select different color themes using the variable
+MENUCONFIG_COLOR. To select a theme use:
+
+ make MENUCONFIG_COLOR=<theme> menuconfig
+
+Available themes are:
+ mono => selects colors suitable for monochrome displays
+ blackbg => selects a color scheme with black background
+ classic => theme with blue background. The classic look
+ bluetitle => a LCD friendly version of classic. (default)
+
MENUCONFIG_MODE
--------------------------------------------------
This mode shows all sub-menus in one large tree.
Example:
- MENUCONFIG_MODE=single_menu make menuconfig
+ make MENUCONFIG_MODE=single_menu menuconfig
+
======================================================================
xconfig
diff --git a/Documentation/kbuild/modules.txt b/Documentation/kbuild/modules.txt
index b1096da953c..0767cf69c69 100644
--- a/Documentation/kbuild/modules.txt
+++ b/Documentation/kbuild/modules.txt
@@ -275,7 +275,7 @@ following files:
KERNELDIR := /lib/modules/`uname -r`/build
all::
- $(MAKE) -C $KERNELDIR M=`pwd` $@
+ $(MAKE) -C $(KERNELDIR) M=`pwd` $@
# Module specific targets
genbin:
diff --git a/Documentation/kdump/kdump.txt b/Documentation/kdump/kdump.txt
index 3f4bc840da8..cab61d84225 100644
--- a/Documentation/kdump/kdump.txt
+++ b/Documentation/kdump/kdump.txt
@@ -108,7 +108,7 @@ There are two possible methods of using Kdump.
2) Or use the system kernel binary itself as dump-capture kernel and there is
no need to build a separate dump-capture kernel. This is possible
- only with the architecutres which support a relocatable kernel. As
+ only with the architectures which support a relocatable kernel. As
of today, i386, x86_64, ppc64 and ia64 architectures support relocatable
kernel.
@@ -222,7 +222,7 @@ Dump-capture kernel config options (Arch Dependent, ia64)
----------------------------------------------------------
- No specific options are required to create a dump-capture kernel
- for ia64, other than those specified in the arch idependent section
+ for ia64, other than those specified in the arch independent section
above. This means that it is possible to use the system kernel
as a dump-capture kernel if desired.
diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index fd5cac01303..7936b801fe6 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -48,6 +48,7 @@ parameter is applicable:
EFI EFI Partitioning (GPT) is enabled
EIDE EIDE/ATAPI support is enabled.
FB The frame buffer device is enabled.
+ GCOV GCOV profiling is enabled.
HW Appropriate hardware is enabled.
IA-64 IA-64 architecture is enabled.
IMA Integrity measurement architecture is enabled.
@@ -56,7 +57,6 @@ parameter is applicable:
ISAPNP ISA PnP code is enabled.
ISDN Appropriate ISDN support is enabled.
JOY Appropriate joystick support is enabled.
- KMEMTRACE kmemtrace is enabled.
LIBATA Libata driver is enabled
LP Printer support is enabled.
LOOP Loopback device support is enabled.
@@ -229,14 +229,6 @@ and is between 256 and 4096 characters. It is defined in the file
to assume that this machine's pmtimer latches its value
and always returns good values.
- acpi.power_nocheck= [HW,ACPI]
- Format: 1/0 enable/disable the check of power state.
- On some bogus BIOS the _PSC object/_STA object of
- power resource can't return the correct device power
- state. In such case it is unneccessary to check its
- power state again in power transition.
- 1 : disable the power state check
-
acpi_sci= [HW,ACPI] ACPI System Control Interrupt trigger mode
Format: { level | edge | high | low }
@@ -329,11 +321,6 @@ and is between 256 and 4096 characters. It is defined in the file
flushed before they will be reused, which
is a lot of faster
- amd_iommu_size= [HW,X86-64]
- Define the size of the aperture for the AMD IOMMU
- driver. Possible values are:
- '32M', '64M' (default), '128M', '256M', '512M', '1G'
-
amijoy.map= [HW,JOY] Amiga joystick support
Map of devices attached to JOY0DAT and JOY1DAT
Format: <a>,<b>
@@ -497,6 +484,13 @@ and is between 256 and 4096 characters. It is defined in the file
Also note the kernel might malfunction if you disable
some critical bits.
+ cmo_free_hint= [PPC] Format: { yes | no }
+ Specify whether pages are marked as being inactive
+ when they are freed. This is used in CMO environments
+ to determine OS memory pressure for page stealing by
+ a hypervisor.
+ Default: yes
+
code_bytes [X86] How many bytes of object code to print
in an oops report.
Range: 0 - 8192
@@ -545,6 +539,10 @@ and is between 256 and 4096 characters. It is defined in the file
console=brl,ttyS0
For now, only VisioBraille is supported.
+ consoleblank= [KNL] The console blank (screen saver) timeout in
+ seconds. Defaults to 10*60 = 10mins. A value of 0
+ disables the blank timer.
+
coredump_filter=
[KNL] Change the default value for
/proc/<pid>/coredump_filter.
@@ -646,6 +644,13 @@ and is between 256 and 4096 characters. It is defined in the file
DMA-API debugging code disables itself because the
architectural default is too low.
+ dma_debug_driver=<driver_name>
+ With this option the DMA-API debugging driver
+ filter feature can be enabled at boot time. Just
+ pass the driver to filter for as the parameter.
+ The filter can be disabled or changed to another
+ driver later using sysfs.
+
dscc4.setup= [NET]
dtc3181e= [HW,SCSI]
@@ -752,12 +757,25 @@ and is between 256 and 4096 characters. It is defined in the file
ia64_pal_cache_flush instead of SAL_CACHE_FLUSH.
ftrace=[tracer]
- [ftrace] will set and start the specified tracer
+ [FTRACE] will set and start the specified tracer
as early as possible in order to facilitate early
boot debugging.
ftrace_dump_on_oops
- [ftrace] will dump the trace buffers on oops.
+ [FTRACE] will dump the trace buffers on oops.
+
+ ftrace_filter=[function-list]
+ [FTRACE] Limit the functions traced by the function
+ tracer at boot up. function-list is a comma separated
+ list of functions. This list can be changed at run
+ time by the set_ftrace_filter file in the debugfs
+ tracing directory.
+
+ ftrace_notrace=[function-list]
+ [FTRACE] Do not trace the functions specified in
+ function-list. This list can be changed at run time
+ by the set_ftrace_notrace file in the debugfs
+ tracing directory.
gamecon.map[2|3]=
[HW,JOY] Multisystem joystick and NES/SNES/PSX pad
@@ -771,6 +789,12 @@ and is between 256 and 4096 characters. It is defined in the file
Format: off | on
default: on
+ gcov_persist= [GCOV] When non-zero (default), profiling data for
+ kernel modules is saved and remains accessible via
+ debugfs, even when the module is unloaded/reloaded.
+ When zero, profiling data is discarded and associated
+ debugfs files are removed at module unload time.
+
gdth= [HW,SCSI]
See header of drivers/scsi/gdth.c.
@@ -873,11 +897,8 @@ and is between 256 and 4096 characters. It is defined in the file
ide-core.nodma= [HW] (E)IDE subsystem
Format: =0.0 to prevent dma on hda, =0.1 hdb =1.0 hdc
- .vlb_clock .pci_clock .noflush .noprobe .nowerr .cdrom
- .chs .ignore_cable are additional options
- See Documentation/ide/ide.txt.
-
- idebus= [HW] (E)IDE subsystem - VLB/PCI bus speed
+ .vlb_clock .pci_clock .noflush .nohpa .noprobe .nowerr
+ .cdrom .chs .ignore_cable are additional options
See Documentation/ide/ide.txt.
ide-pci-generic.all-generic-ide [HW] (E)IDE subsystem
@@ -914,6 +935,12 @@ and is between 256 and 4096 characters. It is defined in the file
Formt: { "sha1" | "md5" }
default: "sha1"
+ ima_tcb [IMA]
+ Load a policy which meets the needs of the Trusted
+ Computing Base. This means IMA will measure all
+ programs exec'd, files mmap'd for exec, and all files
+ opened for read by uid=0.
+
in2000= [HW,SCSI]
See header of drivers/scsi/in2000.c.
@@ -971,6 +998,7 @@ and is between 256 and 4096 characters. It is defined in the file
nomerge
forcesac
soft
+ pt [x86, IA64]
io7= [HW] IO7 for Marvel based alpha systems
See comment before marvel_specify_io7 in
@@ -1054,24 +1082,19 @@ and is between 256 and 4096 characters. It is defined in the file
use the HighMem zone if it exists, and the Normal
zone if it does not.
- kmemtrace.enable= [KNL,KMEMTRACE] Format: { yes | no }
- Controls whether kmemtrace is enabled
- at boot-time.
-
- kmemtrace.subbufs=n [KNL,KMEMTRACE] Overrides the number of
- subbufs kmemtrace's relay channel has. Set this
- higher than default (KMEMTRACE_N_SUBBUFS in code) if
- you experience buffer overruns.
-
kgdboc= [HW] kgdb over consoles.
Requires a tty driver that supports console polling.
- (only serial suported for now)
+ (only serial supported for now)
Format: <serial_device>[,baud]
kmac= [MIPS] korina ethernet MAC address.
Configure the RouterBoard 532 series on-chip
Ethernet adapter MAC address.
+ kmemleak= [KNL] Boot-time kmemleak enable/disable
+ Valid arguments: on, off
+ Default: on
+
kstack=N [X86] Print N words from the kernel stack
in oops dumps.
@@ -1092,6 +1115,10 @@ and is between 256 and 4096 characters. It is defined in the file
libata.dma=4 Compact Flash DMA only
Combinations also work, so libata.dma=3 enables DMA
for disks and CDROMs, but not CFs.
+
+ libata.ignore_hpa= [LIBATA] Ignore HPA limit
+ libata.ignore_hpa=0 keep BIOS limits (default)
+ libata.ignore_hpa=1 ignore limits, using full disk
libata.noacpi [LIBATA] Disables use of ACPI in libata suspend/resume
when set.
@@ -1339,6 +1366,27 @@ and is between 256 and 4096 characters. It is defined in the file
min_addr=nn[KMG] [KNL,BOOT,ia64] All physical memory below this
physical address is ignored.
+ mini2440= [ARM,HW,KNL]
+ Format:[0..2][b][c][t]
+ Default: "0tb"
+ MINI2440 configuration specification:
+ 0 - The attached screen is the 3.5" TFT
+ 1 - The attached screen is the 7" TFT
+ 2 - The VGA Shield is attached (1024x768)
+ Leaving out the screen size parameter will not load
+ the TFT driver, and the framebuffer will be left
+ unconfigured.
+ b - Enable backlight. The TFT backlight pin will be
+ linked to the kernel VESA blanking code and a GPIO
+ LED. This parameter is not necessary when using the
+ VGA shield.
+ c - Enable the s3c camera interface.
+ t - Reserved for enabling touchscreen support. The
+ touchscreen support is not enabled in the mainstream
+ kernel as of 2.6.30, a preliminary port can be found
+ in the "bleeding edge" mini2440 support kernel at
+ http://repo.or.cz/w/linux-2.6/mini2440.git
+
mminit_loglevel=
[KNL] When CONFIG_DEBUG_MEMORY_INIT is set, this
parameter allows control of the logging verbosity for
@@ -1380,6 +1428,16 @@ and is between 256 and 4096 characters. It is defined in the file
mtdparts= [MTD]
See drivers/mtd/cmdlinepart.c.
+ onenand.bdry= [HW,MTD] Flex-OneNAND Boundary Configuration
+
+ Format: [die0_boundary][,die0_lock][,die1_boundary][,die1_lock]
+
+ boundary - index of last SLC block on Flex-OneNAND.
+ The remaining blocks are configured as MLC blocks.
+ lock - Configure if Flex-OneNAND boundary should be locked.
+ Once locked, the boundary cannot be changed.
+ 1 indicates lock status, 0 indicates unlock status.
+
mtdset= [ARM]
ARM/S3C2412 JIVE boot control
@@ -1390,7 +1448,7 @@ and is between 256 and 4096 characters. It is defined in the file
('y', default) or cooked coordinates ('n')
mtrr_chunk_size=nn[KMG] [X86]
- used for mtrr cleanup. It is largest continous chunk
+ used for mtrr cleanup. It is largest continuous chunk
that could hold holes aka. UC entries.
mtrr_gran_size=nn[KMG] [X86]
@@ -1575,6 +1633,9 @@ and is between 256 and 4096 characters. It is defined in the file
noinitrd [RAM] Tells the kernel not to load any configured
initial RAM disk.
+ nointremap [X86-64, Intel-IOMMU] Do not enable interrupt
+ remapping.
+
nointroute [IA-64]
nojitter [IA64] Disables jitter checking for ITC timers.
@@ -1660,6 +1721,14 @@ and is between 256 and 4096 characters. It is defined in the file
oprofile.timer= [HW]
Use timer interrupt instead of performance counters
+ oprofile.cpu_type= Force an oprofile cpu type
+ This might be useful if you have an older oprofile
+ userland or if you want common events.
+ Format: { arch_perfmon }
+ arch_perfmon: [X86] Force use of architectural
+ perfmon on Intel CPUs instead of the
+ CPU specific event set.
+
osst= [HW,SCSI] SCSI Tape Driver
Format: <buffer_size>,<write_threshold>
See also Documentation/scsi/st.txt.
@@ -1735,6 +1804,9 @@ and is between 256 and 4096 characters. It is defined in the file
root domains (aka PCI segments, in ACPI-speak).
nommconf [X86] Disable use of MMCONFIG for PCI
Configuration
+ check_enable_amd_mmconf [X86] check for and enable
+ properly configured MMIO access to PCI
+ config space on AMD family 10h CPU
nomsi [MSI] If the PCI_MSI kernel config parameter is
enabled, this kernel boot option can be used to
disable the use of MSI interrupts system-wide.
@@ -1824,6 +1896,12 @@ and is between 256 and 4096 characters. It is defined in the file
PAGE_SIZE is used as alignment.
PCI-PCI bridge can be specified, if resource
windows need to be expanded.
+ ecrc= Enable/disable PCIe ECRC (transaction layer
+ end-to-end CRC checking).
+ bios: Use BIOS/firmware settings. This is the
+ the default.
+ off: Turn ECRC off
+ on: Turn ECRC on.
pcie_aspm= [PCIE] Forcibly enable or disable PCIe Active State Power
Management.
@@ -1841,6 +1919,12 @@ and is between 256 and 4096 characters. It is defined in the file
Format: { 0 | 1 }
See arch/parisc/kernel/pdc_chassis.c
+ percpu_alloc= [X86] Select which percpu first chunk allocator to use.
+ Allowed values are one of "lpage", "embed" and "4k".
+ See comments in arch/x86/kernel/setup_percpu.c for
+ details on each allocator. This parameter is primarily
+ for debugging and performance comparison.
+
pf. [PARIDE]
See Documentation/blockdev/paride.txt.
@@ -2393,7 +2477,8 @@ and is between 256 and 4096 characters. It is defined in the file
tp720= [HW,PS2]
- trace_buf_size=nn[KMG] [ftrace] will set tracing buffer size.
+ trace_buf_size=nn[KMG]
+ [FTRACE] will set tracing buffer size.
trix= [HW,OSS] MediaTrix AudioTrix Pro
Format:
diff --git a/Documentation/kmemcheck.txt b/Documentation/kmemcheck.txt
new file mode 100644
index 00000000000..363044609da
--- /dev/null
+++ b/Documentation/kmemcheck.txt
@@ -0,0 +1,773 @@
+GETTING STARTED WITH KMEMCHECK
+==============================
+
+Vegard Nossum <vegardno@ifi.uio.no>
+
+
+Contents
+========
+0. Introduction
+1. Downloading
+2. Configuring and compiling
+3. How to use
+3.1. Booting
+3.2. Run-time enable/disable
+3.3. Debugging
+3.4. Annotating false positives
+4. Reporting errors
+5. Technical description
+
+
+0. Introduction
+===============
+
+kmemcheck is a debugging feature for the Linux Kernel. More specifically, it
+is a dynamic checker that detects and warns about some uses of uninitialized
+memory.
+
+Userspace programmers might be familiar with Valgrind's memcheck. The main
+difference between memcheck and kmemcheck is that memcheck works for userspace
+programs only, and kmemcheck works for the kernel only. The implementations
+are of course vastly different. Because of this, kmemcheck is not as accurate
+as memcheck, but it turns out to be good enough in practice to discover real
+programmer errors that the compiler is not able to find through static
+analysis.
+
+Enabling kmemcheck on a kernel will probably slow it down to the extent that
+the machine will not be usable for normal workloads such as e.g. an
+interactive desktop. kmemcheck will also cause the kernel to use about twice
+as much memory as normal. For this reason, kmemcheck is strictly a debugging
+feature.
+
+
+1. Downloading
+==============
+
+kmemcheck can only be downloaded using git. If you want to write patches
+against the current code, you should use the kmemcheck development branch of
+the tip tree. It is also possible to use the linux-next tree, which also
+includes the latest version of kmemcheck.
+
+Assuming that you've already cloned the linux-2.6.git repository, all you
+have to do is add the -tip tree as a remote, like this:
+
+ $ git remote add tip git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip.git
+
+To actually download the tree, fetch the remote:
+
+ $ git fetch tip
+
+And to check out a new local branch with the kmemcheck code:
+
+ $ git checkout -b kmemcheck tip/kmemcheck
+
+General instructions for the -tip tree can be found here:
+http://people.redhat.com/mingo/tip.git/readme.txt
+
+
+2. Configuring and compiling
+============================
+
+kmemcheck only works for the x86 (both 32- and 64-bit) platform. A number of
+configuration variables must have specific settings in order for the kmemcheck
+menu to even appear in "menuconfig". These are:
+
+ o CONFIG_CC_OPTIMIZE_FOR_SIZE=n
+
+ This option is located under "General setup" / "Optimize for size".
+
+ Without this, gcc will use certain optimizations that usually lead to
+ false positive warnings from kmemcheck. An example of this is a 16-bit
+ field in a struct, where gcc may load 32 bits, then discard the upper
+ 16 bits. kmemcheck sees only the 32-bit load, and may trigger a
+ warning for the upper 16 bits (if they're uninitialized).
+
+ o CONFIG_SLAB=y or CONFIG_SLUB=y
+
+ This option is located under "General setup" / "Choose SLAB
+ allocator".
+
+ o CONFIG_FUNCTION_TRACER=n
+
+ This option is located under "Kernel hacking" / "Tracers" / "Kernel
+ Function Tracer"
+
+ When function tracing is compiled in, gcc emits a call to another
+ function at the beginning of every function. This means that when the
+ page fault handler is called, the ftrace framework will be called
+ before kmemcheck has had a chance to handle the fault. If ftrace then
+ modifies memory that was tracked by kmemcheck, the result is an
+ endless recursive page fault.
+
+ o CONFIG_DEBUG_PAGEALLOC=n
+
+ This option is located under "Kernel hacking" / "Debug page memory
+ allocations".
+
+In addition, I highly recommend turning on CONFIG_DEBUG_INFO=y. This is also
+located under "Kernel hacking". With this, you will be able to get line number
+information from the kmemcheck warnings, which is extremely valuable in
+debugging a problem. This option is not mandatory, however, because it slows
+down the compilation process and produces a much bigger kernel image.
+
+Now the kmemcheck menu should be visible (under "Kernel hacking" / "kmemcheck:
+trap use of uninitialized memory"). Here follows a description of the
+kmemcheck configuration variables:
+
+ o CONFIG_KMEMCHECK
+
+ This must be enabled in order to use kmemcheck at all...
+
+ o CONFIG_KMEMCHECK_[DISABLED | ENABLED | ONESHOT]_BY_DEFAULT
+
+ This option controls the status of kmemcheck at boot-time. "Enabled"
+ will enable kmemcheck right from the start, "disabled" will boot the
+ kernel as normal (but with the kmemcheck code compiled in, so it can
+ be enabled at run-time after the kernel has booted), and "one-shot" is
+ a special mode which will turn kmemcheck off automatically after
+ detecting the first use of uninitialized memory.
+
+ If you are using kmemcheck to actively debug a problem, then you
+ probably want to choose "enabled" here.
+
+ The one-shot mode is mostly useful in automated test setups because it
+ can prevent floods of warnings and increase the chances of the machine
+ surviving in case something is really wrong. In other cases, the one-
+ shot mode could actually be counter-productive because it would turn
+ itself off at the very first error -- in the case of a false positive
+ too -- and this would come in the way of debugging the specific
+ problem you were interested in.
+
+ If you would like to use your kernel as normal, but with a chance to
+ enable kmemcheck in case of some problem, it might be a good idea to
+ choose "disabled" here. When kmemcheck is disabled, most of the run-
+ time overhead is not incurred, and the kernel will be almost as fast
+ as normal.
+
+ o CONFIG_KMEMCHECK_QUEUE_SIZE
+
+ Select the maximum number of error reports to store in an internal
+ (fixed-size) buffer. Since errors can occur virtually anywhere and in
+ any context, we need a temporary storage area which is guaranteed not
+ to generate any other page faults when accessed. The queue will be
+ emptied as soon as a tasklet may be scheduled. If the queue is full,
+ new error reports will be lost.
+
+ The default value of 64 is probably fine. If some code produces more
+ than 64 errors within an irqs-off section, then the code is likely to
+ produce many, many more, too, and these additional reports seldom give
+ any more information (the first report is usually the most valuable
+ anyway).
+
+ This number might have to be adjusted if you are not using serial
+ console or similar to capture the kernel log. If you are using the
+ "dmesg" command to save the log, then getting a lot of kmemcheck
+ warnings might overflow the kernel log itself, and the earlier reports
+ will get lost in that way instead. Try setting this to 10 or so on
+ such a setup.
+
+ o CONFIG_KMEMCHECK_SHADOW_COPY_SHIFT
+
+ Select the number of shadow bytes to save along with each entry of the
+ error-report queue. These bytes indicate what parts of an allocation
+ are initialized, uninitialized, etc. and will be displayed when an
+ error is detected to help the debugging of a particular problem.
+
+ The number entered here is actually the logarithm of the number of
+ bytes that will be saved. So if you pick for example 5 here, kmemcheck
+ will save 2^5 = 32 bytes.
+
+ The default value should be fine for debugging most problems. It also
+ fits nicely within 80 columns.
+
+ o CONFIG_KMEMCHECK_PARTIAL_OK
+
+ This option (when enabled) works around certain GCC optimizations that
+ produce 32-bit reads from 16-bit variables where the upper 16 bits are
+ thrown away afterwards.
+
+ The default value (enabled) is recommended. This may of course hide
+ some real errors, but disabling it would probably produce a lot of
+ false positives.
+
+ o CONFIG_KMEMCHECK_BITOPS_OK
+
+ This option silences warnings that would be generated for bit-field
+ accesses where not all the bits are initialized at the same time. This
+ may also hide some real bugs.
+
+ This option is probably obsolete, or it should be replaced with
+ the kmemcheck-/bitfield-annotations for the code in question. The
+ default value is therefore fine.
+
+Now compile the kernel as usual.
+
+
+3. How to use
+=============
+
+3.1. Booting
+============
+
+First some information about the command-line options. There is only one
+option specific to kmemcheck, and this is called "kmemcheck". It can be used
+to override the default mode as chosen by the CONFIG_KMEMCHECK_*_BY_DEFAULT
+option. Its possible settings are:
+
+ o kmemcheck=0 (disabled)
+ o kmemcheck=1 (enabled)
+ o kmemcheck=2 (one-shot mode)
+
+If SLUB debugging has been enabled in the kernel, it may take precedence over
+kmemcheck in such a way that the slab caches which are under SLUB debugging
+will not be tracked by kmemcheck. In order to ensure that this doesn't happen
+(even though it shouldn't by default), use SLUB's boot option "slub_debug",
+like this: slub_debug=-
+
+In fact, this option may also be used for fine-grained control over SLUB vs.
+kmemcheck. For example, if the command line includes "kmemcheck=1
+slub_debug=,dentry", then SLUB debugging will be used only for the "dentry"
+slab cache, and with kmemcheck tracking all the other caches. This is advanced
+usage, however, and is not generally recommended.
+
+
+3.2. Run-time enable/disable
+============================
+
+When the kernel has booted, it is possible to enable or disable kmemcheck at
+run-time. WARNING: This feature is still experimental and may cause false
+positive warnings to appear. Therefore, try not to use this. If you find that
+it doesn't work properly (e.g. you see an unreasonable amount of warnings), I
+will be happy to take bug reports.
+
+Use the file /proc/sys/kernel/kmemcheck for this purpose, e.g.:
+
+ $ echo 0 > /proc/sys/kernel/kmemcheck # disables kmemcheck
+
+The numbers are the same as for the kmemcheck= command-line option.
+
+
+3.3. Debugging
+==============
+
+A typical report will look something like this:
+
+WARNING: kmemcheck: Caught 32-bit read from uninitialized memory (ffff88003e4a2024)
+80000000000000000000000000000000000000000088ffff0000000000000000
+ i i i i u u u u i i i i i i i i u u u u u u u u u u u u u u u u
+ ^
+
+Pid: 1856, comm: ntpdate Not tainted 2.6.29-rc5 #264 945P-A
+RIP: 0010:[<ffffffff8104ede8>] [<ffffffff8104ede8>] __dequeue_signal+0xc8/0x190
+RSP: 0018:ffff88003cdf7d98 EFLAGS: 00210002
+RAX: 0000000000000030 RBX: ffff88003d4ea968 RCX: 0000000000000009
+RDX: ffff88003e5d6018 RSI: ffff88003e5d6024 RDI: ffff88003cdf7e84
+RBP: ffff88003cdf7db8 R08: ffff88003e5d6000 R09: 0000000000000000
+R10: 0000000000000080 R11: 0000000000000000 R12: 000000000000000e
+R13: ffff88003cdf7e78 R14: ffff88003d530710 R15: ffff88003d5a98c8
+FS: 0000000000000000(0000) GS:ffff880001982000(0063) knlGS:00000
+CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033
+CR2: ffff88003f806ea0 CR3: 000000003c036000 CR4: 00000000000006a0
+DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
+DR3: 0000000000000000 DR6: 00000000ffff4ff0 DR7: 0000000000000400
+ [<ffffffff8104f04e>] dequeue_signal+0x8e/0x170
+ [<ffffffff81050bd8>] get_signal_to_deliver+0x98/0x390
+ [<ffffffff8100b87d>] do_notify_resume+0xad/0x7d0
+ [<ffffffff8100c7b5>] int_signal+0x12/0x17
+ [<ffffffffffffffff>] 0xffffffffffffffff
+
+The single most valuable information in this report is the RIP (or EIP on 32-
+bit) value. This will help us pinpoint exactly which instruction that caused
+the warning.
+
+If your kernel was compiled with CONFIG_DEBUG_INFO=y, then all we have to do
+is give this address to the addr2line program, like this:
+
+ $ addr2line -e vmlinux -i ffffffff8104ede8
+ arch/x86/include/asm/string_64.h:12
+ include/asm-generic/siginfo.h:287
+ kernel/signal.c:380
+ kernel/signal.c:410
+
+The "-e vmlinux" tells addr2line which file to look in. IMPORTANT: This must
+be the vmlinux of the kernel that produced the warning in the first place! If
+not, the line number information will almost certainly be wrong.
+
+The "-i" tells addr2line to also print the line numbers of inlined functions.
+In this case, the flag was very important, because otherwise, it would only
+have printed the first line, which is just a call to memcpy(), which could be
+called from a thousand places in the kernel, and is therefore not very useful.
+These inlined functions would not show up in the stack trace above, simply
+because the kernel doesn't load the extra debugging information. This
+technique can of course be used with ordinary kernel oopses as well.
+
+In this case, it's the caller of memcpy() that is interesting, and it can be
+found in include/asm-generic/siginfo.h, line 287:
+
+281 static inline void copy_siginfo(struct siginfo *to, struct siginfo *from)
+282 {
+283 if (from->si_code < 0)
+284 memcpy(to, from, sizeof(*to));
+285 else
+286 /* _sigchld is currently the largest know union member */
+287 memcpy(to, from, __ARCH_SI_PREAMBLE_SIZE + sizeof(from->_sifields._sigchld));
+288 }
+
+Since this was a read (kmemcheck usually warns about reads only, though it can
+warn about writes to unallocated or freed memory as well), it was probably the
+"from" argument which contained some uninitialized bytes. Following the chain
+of calls, we move upwards to see where "from" was allocated or initialized,
+kernel/signal.c, line 380:
+
+359 static void collect_signal(int sig, struct sigpending *list, siginfo_t *info)
+360 {
+...
+367 list_for_each_entry(q, &list->list, list) {
+368 if (q->info.si_signo == sig) {
+369 if (first)
+370 goto still_pending;
+371 first = q;
+...
+377 if (first) {
+378 still_pending:
+379 list_del_init(&first->list);
+380 copy_siginfo(info, &first->info);
+381 __sigqueue_free(first);
+...
+392 }
+393 }
+
+Here, it is &first->info that is being passed on to copy_siginfo(). The
+variable "first" was found on a list -- passed in as the second argument to
+collect_signal(). We continue our journey through the stack, to figure out
+where the item on "list" was allocated or initialized. We move to line 410:
+
+395 static int __dequeue_signal(struct sigpending *pending, sigset_t *mask,
+396 siginfo_t *info)
+397 {
+...
+410 collect_signal(sig, pending, info);
+...
+414 }
+
+Now we need to follow the "pending" pointer, since that is being passed on to
+collect_signal() as "list". At this point, we've run out of lines from the
+"addr2line" output. Not to worry, we just paste the next addresses from the
+kmemcheck stack dump, i.e.:
+
+ [<ffffffff8104f04e>] dequeue_signal+0x8e/0x170
+ [<ffffffff81050bd8>] get_signal_to_deliver+0x98/0x390
+ [<ffffffff8100b87d>] do_notify_resume+0xad/0x7d0
+ [<ffffffff8100c7b5>] int_signal+0x12/0x17
+
+ $ addr2line -e vmlinux -i ffffffff8104f04e ffffffff81050bd8 \
+ ffffffff8100b87d ffffffff8100c7b5
+ kernel/signal.c:446
+ kernel/signal.c:1806
+ arch/x86/kernel/signal.c:805
+ arch/x86/kernel/signal.c:871
+ arch/x86/kernel/entry_64.S:694
+
+Remember that since these addresses were found on the stack and not as the
+RIP value, they actually point to the _next_ instruction (they are return
+addresses). This becomes obvious when we look at the code for line 446:
+
+422 int dequeue_signal(struct task_struct *tsk, sigset_t *mask, siginfo_t *info)
+423 {
+...
+431 signr = __dequeue_signal(&tsk->signal->shared_pending,
+432 mask, info);
+433 /*
+434 * itimer signal ?
+435 *
+436 * itimers are process shared and we restart periodic
+437 * itimers in the signal delivery path to prevent DoS
+438 * attacks in the high resolution timer case. This is
+439 * compliant with the old way of self restarting
+440 * itimers, as the SIGALRM is a legacy signal and only
+441 * queued once. Changing the restart behaviour to
+442 * restart the timer in the signal dequeue path is
+443 * reducing the timer noise on heavy loaded !highres
+444 * systems too.
+445 */
+446 if (unlikely(signr == SIGALRM)) {
+...
+489 }
+
+So instead of looking at 446, we should be looking at 431, which is the line
+that executes just before 446. Here we see that what we are looking for is
+&tsk->signal->shared_pending.
+
+Our next task is now to figure out which function that puts items on this
+"shared_pending" list. A crude, but efficient tool, is git grep:
+
+ $ git grep -n 'shared_pending' kernel/
+ ...
+ kernel/signal.c:828: pending = group ? &t->signal->shared_pending : &t->pending;
+ kernel/signal.c:1339: pending = group ? &t->signal->shared_pending : &t->pending;
+ ...
+
+There were more results, but none of them were related to list operations,
+and these were the only assignments. We inspect the line numbers more closely
+and find that this is indeed where items are being added to the list:
+
+816 static int send_signal(int sig, struct siginfo *info, struct task_struct *t,
+817 int group)
+818 {
+...
+828 pending = group ? &t->signal->shared_pending : &t->pending;
+...
+851 q = __sigqueue_alloc(t, GFP_ATOMIC, (sig < SIGRTMIN &&
+852 (is_si_special(info) ||
+853 info->si_code >= 0)));
+854 if (q) {
+855 list_add_tail(&q->list, &pending->list);
+...
+890 }
+
+and:
+
+1309 int send_sigqueue(struct sigqueue *q, struct task_struct *t, int group)
+1310 {
+....
+1339 pending = group ? &t->signal->shared_pending : &t->pending;
+1340 list_add_tail(&q->list, &pending->list);
+....
+1347 }
+
+In the first case, the list element we are looking for, "q", is being returned
+from the function __sigqueue_alloc(), which looks like an allocation function.
+Let's take a look at it:
+
+187 static struct sigqueue *__sigqueue_alloc(struct task_struct *t, gfp_t flags,
+188 int override_rlimit)
+189 {
+190 struct sigqueue *q = NULL;
+191 struct user_struct *user;
+192
+193 /*
+194 * We won't get problems with the target's UID changing under us
+195 * because changing it requires RCU be used, and if t != current, the
+196 * caller must be holding the RCU readlock (by way of a spinlock) and
+197 * we use RCU protection here
+198 */
+199 user = get_uid(__task_cred(t)->user);
+200 atomic_inc(&user->sigpending);
+201 if (override_rlimit ||
+202 atomic_read(&user->sigpending) <=
+203 t->signal->rlim[RLIMIT_SIGPENDING].rlim_cur)
+204 q = kmem_cache_alloc(sigqueue_cachep, flags);
+205 if (unlikely(q == NULL)) {
+206 atomic_dec(&user->sigpending);
+207 free_uid(user);
+208 } else {
+209 INIT_LIST_HEAD(&q->list);
+210 q->flags = 0;
+211 q->user = user;
+212 }
+213
+214 return q;
+215 }
+
+We see that this function initializes q->list, q->flags, and q->user. It seems
+that now is the time to look at the definition of "struct sigqueue", e.g.:
+
+14 struct sigqueue {
+15 struct list_head list;
+16 int flags;
+17 siginfo_t info;
+18 struct user_struct *user;
+19 };
+
+And, you might remember, it was a memcpy() on &first->info that caused the
+warning, so this makes perfect sense. It also seems reasonable to assume that
+it is the caller of __sigqueue_alloc() that has the responsibility of filling
+out (initializing) this member.
+
+But just which fields of the struct were uninitialized? Let's look at
+kmemcheck's report again:
+
+WARNING: kmemcheck: Caught 32-bit read from uninitialized memory (ffff88003e4a2024)
+80000000000000000000000000000000000000000088ffff0000000000000000
+ i i i i u u u u i i i i i i i i u u u u u u u u u u u u u u u u
+ ^
+
+These first two lines are the memory dump of the memory object itself, and the
+shadow bytemap, respectively. The memory object itself is in this case
+&first->info. Just beware that the start of this dump is NOT the start of the
+object itself! The position of the caret (^) corresponds with the address of
+the read (ffff88003e4a2024).
+
+The shadow bytemap dump legend is as follows:
+
+ i - initialized
+ u - uninitialized
+ a - unallocated (memory has been allocated by the slab layer, but has not
+ yet been handed off to anybody)
+ f - freed (memory has been allocated by the slab layer, but has been freed
+ by the previous owner)
+
+In order to figure out where (relative to the start of the object) the
+uninitialized memory was located, we have to look at the disassembly. For
+that, we'll need the RIP address again:
+
+RIP: 0010:[<ffffffff8104ede8>] [<ffffffff8104ede8>] __dequeue_signal+0xc8/0x190
+
+ $ objdump -d --no-show-raw-insn vmlinux | grep -C 8 ffffffff8104ede8:
+ ffffffff8104edc8: mov %r8,0x8(%r8)
+ ffffffff8104edcc: test %r10d,%r10d
+ ffffffff8104edcf: js ffffffff8104ee88 <__dequeue_signal+0x168>
+ ffffffff8104edd5: mov %rax,%rdx
+ ffffffff8104edd8: mov $0xc,%ecx
+ ffffffff8104eddd: mov %r13,%rdi
+ ffffffff8104ede0: mov $0x30,%eax
+ ffffffff8104ede5: mov %rdx,%rsi
+ ffffffff8104ede8: rep movsl %ds:(%rsi),%es:(%rdi)
+ ffffffff8104edea: test $0x2,%al
+ ffffffff8104edec: je ffffffff8104edf0 <__dequeue_signal+0xd0>
+ ffffffff8104edee: movsw %ds:(%rsi),%es:(%rdi)
+ ffffffff8104edf0: test $0x1,%al
+ ffffffff8104edf2: je ffffffff8104edf5 <__dequeue_signal+0xd5>
+ ffffffff8104edf4: movsb %ds:(%rsi),%es:(%rdi)
+ ffffffff8104edf5: mov %r8,%rdi
+ ffffffff8104edf8: callq ffffffff8104de60 <__sigqueue_free>
+
+As expected, it's the "rep movsl" instruction from the memcpy() that causes
+the warning. We know about REP MOVSL that it uses the register RCX to count
+the number of remaining iterations. By taking a look at the register dump
+again (from the kmemcheck report), we can figure out how many bytes were left
+to copy:
+
+RAX: 0000000000000030 RBX: ffff88003d4ea968 RCX: 0000000000000009
+
+By looking at the disassembly, we also see that %ecx is being loaded with the
+value $0xc just before (ffffffff8104edd8), so we are very lucky. Keep in mind
+that this is the number of iterations, not bytes. And since this is a "long"
+operation, we need to multiply by 4 to get the number of bytes. So this means
+that the uninitialized value was encountered at 4 * (0xc - 0x9) = 12 bytes
+from the start of the object.
+
+We can now try to figure out which field of the "struct siginfo" that was not
+initialized. This is the beginning of the struct:
+
+40 typedef struct siginfo {
+41 int si_signo;
+42 int si_errno;
+43 int si_code;
+44
+45 union {
+..
+92 } _sifields;
+93 } siginfo_t;
+
+On 64-bit, the int is 4 bytes long, so it must the the union member that has
+not been initialized. We can verify this using gdb:
+
+ $ gdb vmlinux
+ ...
+ (gdb) p &((struct siginfo *) 0)->_sifields
+ $1 = (union {...} *) 0x10
+
+Actually, it seems that the union member is located at offset 0x10 -- which
+means that gcc has inserted 4 bytes of padding between the members si_code
+and _sifields. We can now get a fuller picture of the memory dump:
+
+ _----------------------------=> si_code
+ / _--------------------=> (padding)
+ | / _------------=> _sifields(._kill._pid)
+ | | / _----=> _sifields(._kill._uid)
+ | | | /
+-------|-------|-------|-------|
+80000000000000000000000000000000000000000088ffff0000000000000000
+ i i i i u u u u i i i i i i i i u u u u u u u u u u u u u u u u
+
+This allows us to realize another important fact: si_code contains the value
+0x80. Remember that x86 is little endian, so the first 4 bytes "80000000" are
+really the number 0x00000080. With a bit of research, we find that this is
+actually the constant SI_KERNEL defined in include/asm-generic/siginfo.h:
+
+144 #define SI_KERNEL 0x80 /* sent by the kernel from somewhere */
+
+This macro is used in exactly one place in the x86 kernel: In send_signal()
+in kernel/signal.c:
+
+816 static int send_signal(int sig, struct siginfo *info, struct task_struct *t,
+817 int group)
+818 {
+...
+828 pending = group ? &t->signal->shared_pending : &t->pending;
+...
+851 q = __sigqueue_alloc(t, GFP_ATOMIC, (sig < SIGRTMIN &&
+852 (is_si_special(info) ||
+853 info->si_code >= 0)));
+854 if (q) {
+855 list_add_tail(&q->list, &pending->list);
+856 switch ((unsigned long) info) {
+...
+865 case (unsigned long) SEND_SIG_PRIV:
+866 q->info.si_signo = sig;
+867 q->info.si_errno = 0;
+868 q->info.si_code = SI_KERNEL;
+869 q->info.si_pid = 0;
+870 q->info.si_uid = 0;
+871 break;
+...
+890 }
+
+Not only does this match with the .si_code member, it also matches the place
+we found earlier when looking for where siginfo_t objects are enqueued on the
+"shared_pending" list.
+
+So to sum up: It seems that it is the padding introduced by the compiler
+between two struct fields that is uninitialized, and this gets reported when
+we do a memcpy() on the struct. This means that we have identified a false
+positive warning.
+
+Normally, kmemcheck will not report uninitialized accesses in memcpy() calls
+when both the source and destination addresses are tracked. (Instead, we copy
+the shadow bytemap as well). In this case, the destination address clearly
+was not tracked. We can dig a little deeper into the stack trace from above:
+
+ arch/x86/kernel/signal.c:805
+ arch/x86/kernel/signal.c:871
+ arch/x86/kernel/entry_64.S:694
+
+And we clearly see that the destination siginfo object is located on the
+stack:
+
+782 static void do_signal(struct pt_regs *regs)
+783 {
+784 struct k_sigaction ka;
+785 siginfo_t info;
+...
+804 signr = get_signal_to_deliver(&info, &ka, regs, NULL);
+...
+854 }
+
+And this &info is what eventually gets passed to copy_siginfo() as the
+destination argument.
+
+Now, even though we didn't find an actual error here, the example is still a
+good one, because it shows how one would go about to find out what the report
+was all about.
+
+
+3.4. Annotating false positives
+===============================
+
+There are a few different ways to make annotations in the source code that
+will keep kmemcheck from checking and reporting certain allocations. Here
+they are:
+
+ o __GFP_NOTRACK_FALSE_POSITIVE
+
+ This flag can be passed to kmalloc() or kmem_cache_alloc() (therefore
+ also to other functions that end up calling one of these) to indicate
+ that the allocation should not be tracked because it would lead to
+ a false positive report. This is a "big hammer" way of silencing
+ kmemcheck; after all, even if the false positive pertains to
+ particular field in a struct, for example, we will now lose the
+ ability to find (real) errors in other parts of the same struct.
+
+ Example:
+
+ /* No warnings will ever trigger on accessing any part of x */
+ x = kmalloc(sizeof *x, GFP_KERNEL | __GFP_NOTRACK_FALSE_POSITIVE);
+
+ o kmemcheck_bitfield_begin(name)/kmemcheck_bitfield_end(name) and
+ kmemcheck_annotate_bitfield(ptr, name)
+
+ The first two of these three macros can be used inside struct
+ definitions to signal, respectively, the beginning and end of a
+ bitfield. Additionally, this will assign the bitfield a name, which
+ is given as an argument to the macros.
+
+ Having used these markers, one can later use
+ kmemcheck_annotate_bitfield() at the point of allocation, to indicate
+ which parts of the allocation is part of a bitfield.
+
+ Example:
+
+ struct foo {
+ int x;
+
+ kmemcheck_bitfield_begin(flags);
+ int flag_a:1;
+ int flag_b:1;
+ kmemcheck_bitfield_end(flags);
+
+ int y;
+ };
+
+ struct foo *x = kmalloc(sizeof *x);
+
+ /* No warnings will trigger on accessing the bitfield of x */
+ kmemcheck_annotate_bitfield(x, flags);
+
+ Note that kmemcheck_annotate_bitfield() can be used even before the
+ return value of kmalloc() is checked -- in other words, passing NULL
+ as the first argument is legal (and will do nothing).
+
+
+4. Reporting errors
+===================
+
+As we have seen, kmemcheck will produce false positive reports. Therefore, it
+is not very wise to blindly post kmemcheck warnings to mailing lists and
+maintainers. Instead, I encourage maintainers and developers to find errors
+in their own code. If you get a warning, you can try to work around it, try
+to figure out if it's a real error or not, or simply ignore it. Most
+developers know their own code and will quickly and efficiently determine the
+root cause of a kmemcheck report. This is therefore also the most efficient
+way to work with kmemcheck.
+
+That said, we (the kmemcheck maintainers) will always be on the lookout for
+false positives that we can annotate and silence. So whatever you find,
+please drop us a note privately! Kernel configs and steps to reproduce (if
+available) are of course a great help too.
+
+Happy hacking!
+
+
+5. Technical description
+========================
+
+kmemcheck works by marking memory pages non-present. This means that whenever
+somebody attempts to access the page, a page fault is generated. The page
+fault handler notices that the page was in fact only hidden, and so it calls
+on the kmemcheck code to make further investigations.
+
+When the investigations are completed, kmemcheck "shows" the page by marking
+it present (as it would be under normal circumstances). This way, the
+interrupted code can continue as usual.
+
+But after the instruction has been executed, we should hide the page again, so
+that we can catch the next access too! Now kmemcheck makes use of a debugging
+feature of the processor, namely single-stepping. When the processor has
+finished the one instruction that generated the memory access, a debug
+exception is raised. From here, we simply hide the page again and continue
+execution, this time with the single-stepping feature turned off.
+
+kmemcheck requires some assistance from the memory allocator in order to work.
+The memory allocator needs to
+
+ 1. Tell kmemcheck about newly allocated pages and pages that are about to
+ be freed. This allows kmemcheck to set up and tear down the shadow memory
+ for the pages in question. The shadow memory stores the status of each
+ byte in the allocation proper, e.g. whether it is initialized or
+ uninitialized.
+
+ 2. Tell kmemcheck which parts of memory should be marked uninitialized.
+ There are actually a few more states, such as "not yet allocated" and
+ "recently freed".
+
+If a slab cache is set up using the SLAB_NOTRACK flag, it will never return
+memory that can take page faults because of kmemcheck.
+
+If a slab cache is NOT set up using the SLAB_NOTRACK flag, callers can still
+request memory with the __GFP_NOTRACK or __GFP_NOTRACK_FALSE_POSITIVE flags.
+This does not prevent the page faults from occurring, however, but marks the
+object in question as being initialized so that no warnings will ever be
+produced for this object.
+
+Currently, the SLAB and SLUB allocators are supported by kmemcheck.
diff --git a/Documentation/kmemleak.txt b/Documentation/kmemleak.txt
new file mode 100644
index 00000000000..89068030b01
--- /dev/null
+++ b/Documentation/kmemleak.txt
@@ -0,0 +1,151 @@
+Kernel Memory Leak Detector
+===========================
+
+Introduction
+------------
+
+Kmemleak provides a way of detecting possible kernel memory leaks in a
+way similar to a tracing garbage collector
+(http://en.wikipedia.org/wiki/Garbage_collection_%28computer_science%29#Tracing_garbage_collectors),
+with the difference that the orphan objects are not freed but only
+reported via /sys/kernel/debug/kmemleak. A similar method is used by the
+Valgrind tool (memcheck --leak-check) to detect the memory leaks in
+user-space applications.
+
+Usage
+-----
+
+CONFIG_DEBUG_KMEMLEAK in "Kernel hacking" has to be enabled. A kernel
+thread scans the memory every 10 minutes (by default) and prints the
+number of new unreferenced objects found. To display the details of all
+the possible memory leaks:
+
+ # mount -t debugfs nodev /sys/kernel/debug/
+ # cat /sys/kernel/debug/kmemleak
+
+To trigger an intermediate memory scan:
+
+ # echo scan > /sys/kernel/debug/kmemleak
+
+Note that the orphan objects are listed in the order they were allocated
+and one object at the beginning of the list may cause other subsequent
+objects to be reported as orphan.
+
+Memory scanning parameters can be modified at run-time by writing to the
+/sys/kernel/debug/kmemleak file. The following parameters are supported:
+
+ off - disable kmemleak (irreversible)
+ stack=on - enable the task stacks scanning (default)
+ stack=off - disable the tasks stacks scanning
+ scan=on - start the automatic memory scanning thread (default)
+ scan=off - stop the automatic memory scanning thread
+ scan=<secs> - set the automatic memory scanning period in seconds
+ (default 600, 0 to stop the automatic scanning)
+ scan - trigger a memory scan
+
+Kmemleak can also be disabled at boot-time by passing "kmemleak=off" on
+the kernel command line.
+
+Memory may be allocated or freed before kmemleak is initialised and
+these actions are stored in an early log buffer. The size of this buffer
+is configured via the CONFIG_DEBUG_KMEMLEAK_EARLY_LOG_SIZE option.
+
+Basic Algorithm
+---------------
+
+The memory allocations via kmalloc, vmalloc, kmem_cache_alloc and
+friends are traced and the pointers, together with additional
+information like size and stack trace, are stored in a prio search tree.
+The corresponding freeing function calls are tracked and the pointers
+removed from the kmemleak data structures.
+
+An allocated block of memory is considered orphan if no pointer to its
+start address or to any location inside the block can be found by
+scanning the memory (including saved registers). This means that there
+might be no way for the kernel to pass the address of the allocated
+block to a freeing function and therefore the block is considered a
+memory leak.
+
+The scanning algorithm steps:
+
+ 1. mark all objects as white (remaining white objects will later be
+ considered orphan)
+ 2. scan the memory starting with the data section and stacks, checking
+ the values against the addresses stored in the prio search tree. If
+ a pointer to a white object is found, the object is added to the
+ gray list
+ 3. scan the gray objects for matching addresses (some white objects
+ can become gray and added at the end of the gray list) until the
+ gray set is finished
+ 4. the remaining white objects are considered orphan and reported via
+ /sys/kernel/debug/kmemleak
+
+Some allocated memory blocks have pointers stored in the kernel's
+internal data structures and they cannot be detected as orphans. To
+avoid this, kmemleak can also store the number of values pointing to an
+address inside the block address range that need to be found so that the
+block is not considered a leak. One example is __vmalloc().
+
+Kmemleak API
+------------
+
+See the include/linux/kmemleak.h header for the functions prototype.
+
+kmemleak_init - initialize kmemleak
+kmemleak_alloc - notify of a memory block allocation
+kmemleak_free - notify of a memory block freeing
+kmemleak_not_leak - mark an object as not a leak
+kmemleak_ignore - do not scan or report an object as leak
+kmemleak_scan_area - add scan areas inside a memory block
+kmemleak_no_scan - do not scan a memory block
+kmemleak_erase - erase an old value in a pointer variable
+kmemleak_alloc_recursive - as kmemleak_alloc but checks the recursiveness
+kmemleak_free_recursive - as kmemleak_free but checks the recursiveness
+
+Dealing with false positives/negatives
+--------------------------------------
+
+The false negatives are real memory leaks (orphan objects) but not
+reported by kmemleak because values found during the memory scanning
+point to such objects. To reduce the number of false negatives, kmemleak
+provides the kmemleak_ignore, kmemleak_scan_area, kmemleak_no_scan and
+kmemleak_erase functions (see above). The task stacks also increase the
+amount of false negatives and their scanning is not enabled by default.
+
+The false positives are objects wrongly reported as being memory leaks
+(orphan). For objects known not to be leaks, kmemleak provides the
+kmemleak_not_leak function. The kmemleak_ignore could also be used if
+the memory block is known not to contain other pointers and it will no
+longer be scanned.
+
+Some of the reported leaks are only transient, especially on SMP
+systems, because of pointers temporarily stored in CPU registers or
+stacks. Kmemleak defines MSECS_MIN_AGE (defaulting to 1000) representing
+the minimum age of an object to be reported as a memory leak.
+
+Limitations and Drawbacks
+-------------------------
+
+The main drawback is the reduced performance of memory allocation and
+freeing. To avoid other penalties, the memory scanning is only performed
+when the /sys/kernel/debug/kmemleak file is read. Anyway, this tool is
+intended for debugging purposes where the performance might not be the
+most important requirement.
+
+To keep the algorithm simple, kmemleak scans for values pointing to any
+address inside a block's address range. This may lead to an increased
+number of false negatives. However, it is likely that a real memory leak
+will eventually become visible.
+
+Another source of false negatives is the data stored in non-pointer
+values. In a future version, kmemleak could only scan the pointer
+members in the allocated structures. This feature would solve many of
+the false negative cases described above.
+
+The tool can report false positives. These are cases where an allocated
+block doesn't need to be freed (some cases in the init_call functions),
+the pointer is calculated by other methods than the usual container_of
+macro or the pointer is stored in a location not scanned by kmemleak.
+
+Page allocations and ioremap are not tracked. Only the ARM and x86
+architectures are currently supported.
diff --git a/Documentation/kobject.txt b/Documentation/kobject.txt
index b2e374586bd..c79ab996dad 100644
--- a/Documentation/kobject.txt
+++ b/Documentation/kobject.txt
@@ -132,7 +132,7 @@ kobject_name():
const char *kobject_name(const struct kobject * kobj);
There is a helper function to both initialize and add the kobject to the
-kernel at the same time, called supprisingly enough kobject_init_and_add():
+kernel at the same time, called surprisingly enough kobject_init_and_add():
int kobject_init_and_add(struct kobject *kobj, struct kobj_type *ktype,
struct kobject *parent, const char *fmt, ...);
diff --git a/Documentation/kprobes.txt b/Documentation/kprobes.txt
index 1e7a769a10f..053037a1fe6 100644
--- a/Documentation/kprobes.txt
+++ b/Documentation/kprobes.txt
@@ -507,9 +507,9 @@ http://www.linuxsymposium.org/2006/linuxsymposium_procv2.pdf (pages 101-115)
Appendix A: The kprobes debugfs interface
With recent kernels (> 2.6.20) the list of registered kprobes is visible
-under the /debug/kprobes/ directory (assuming debugfs is mounted at /debug).
+under the /sys/kernel/debug/kprobes/ directory (assuming debugfs is mounted at //sys/kernel/debug).
-/debug/kprobes/list: Lists all registered probes on the system
+/sys/kernel/debug/kprobes/list: Lists all registered probes on the system
c015d71a k vfs_read+0x0
c011a316 j do_fork+0x0
@@ -525,7 +525,7 @@ virtual addresses that correspond to modules that've been unloaded),
such probes are marked with [GONE]. If the probe is temporarily disabled,
such probes are marked with [DISABLED].
-/debug/kprobes/enabled: Turn kprobes ON/OFF forcibly.
+/sys/kernel/debug/kprobes/enabled: Turn kprobes ON/OFF forcibly.
Provides a knob to globally and forcibly turn registered kprobes ON or OFF.
By default, all kprobes are enabled. By echoing "0" to this file, all
diff --git a/Documentation/laptops/acer-wmi.txt b/Documentation/laptops/acer-wmi.txt
index 5ee2a02b3b4..0768fcc3ba3 100644
--- a/Documentation/laptops/acer-wmi.txt
+++ b/Documentation/laptops/acer-wmi.txt
@@ -40,7 +40,7 @@ NOTE: The Acer Aspire One is not supported hardware. It cannot work with
acer-wmi until Acer fix their ACPI-WMI implementation on them, so has been
blacklisted until that happens.
-Please see the website for the current list of known working hardare:
+Please see the website for the current list of known working hardware:
http://code.google.com/p/aceracpi/wiki/SupportedHardware
diff --git a/Documentation/laptops/sony-laptop.txt b/Documentation/laptops/sony-laptop.txt
index 8b2bc1572d9..23ce7d350d1 100644
--- a/Documentation/laptops/sony-laptop.txt
+++ b/Documentation/laptops/sony-laptop.txt
@@ -22,7 +22,7 @@ If your laptop model supports it, you will find sysfs files in the
/sys/class/backlight/sony/
directory. You will be able to query and set the current screen
brightness:
- brightness get/set screen brightness (an iteger
+ brightness get/set screen brightness (an integer
between 0 and 7)
actual_brightness reading from this file will query the HW
to get real brightness value
diff --git a/Documentation/laptops/thinkpad-acpi.txt b/Documentation/laptops/thinkpad-acpi.txt
index e7e9a69069e..e2ddcdeb61b 100644
--- a/Documentation/laptops/thinkpad-acpi.txt
+++ b/Documentation/laptops/thinkpad-acpi.txt
@@ -36,8 +36,6 @@ detailed description):
- Bluetooth enable and disable
- video output switching, expansion control
- ThinkLight on and off
- - limited docking and undocking
- - UltraBay eject
- CMOS/UCMS control
- LED control
- ACPI sounds
@@ -506,7 +504,7 @@ generate input device EV_KEY events.
In addition to the EV_KEY events, thinkpad-acpi may also issue EV_SW
events for switches:
-SW_RFKILL_ALL T60 and later hardare rfkill rocker switch
+SW_RFKILL_ALL T60 and later hardware rfkill rocker switch
SW_TABLET_MODE Tablet ThinkPads HKEY events 0x5009 and 0x500A
Non hot-key ACPI HKEY event map:
@@ -729,131 +727,6 @@ cannot be read or if it is unknown, thinkpad-acpi will report it as "off".
It is impossible to know if the status returned through sysfs is valid.
-Docking / undocking -- /proc/acpi/ibm/dock
-------------------------------------------
-
-Docking and undocking (e.g. with the X4 UltraBase) requires some
-actions to be taken by the operating system to safely make or break
-the electrical connections with the dock.
-
-The docking feature of this driver generates the following ACPI events:
-
- ibm/dock GDCK 00000003 00000001 -- eject request
- ibm/dock GDCK 00000003 00000002 -- undocked
- ibm/dock GDCK 00000000 00000003 -- docked
-
-NOTE: These events will only be generated if the laptop was docked
-when originally booted. This is due to the current lack of support for
-hot plugging of devices in the Linux ACPI framework. If the laptop was
-booted while not in the dock, the following message is shown in the
-logs:
-
- Mar 17 01:42:34 aero kernel: thinkpad_acpi: dock device not present
-
-In this case, no dock-related events are generated but the dock and
-undock commands described below still work. They can be executed
-manually or triggered by Fn key combinations (see the example acpid
-configuration files included in the driver tarball package available
-on the web site).
-
-When the eject request button on the dock is pressed, the first event
-above is generated. The handler for this event should issue the
-following command:
-
- echo undock > /proc/acpi/ibm/dock
-
-After the LED on the dock goes off, it is safe to eject the laptop.
-Note: if you pressed this key by mistake, go ahead and eject the
-laptop, then dock it back in. Otherwise, the dock may not function as
-expected.
-
-When the laptop is docked, the third event above is generated. The
-handler for this event should issue the following command to fully
-enable the dock:
-
- echo dock > /proc/acpi/ibm/dock
-
-The contents of the /proc/acpi/ibm/dock file shows the current status
-of the dock, as provided by the ACPI framework.
-
-The docking support in this driver does not take care of enabling or
-disabling any other devices you may have attached to the dock. For
-example, a CD drive plugged into the UltraBase needs to be disabled or
-enabled separately. See the provided example acpid configuration files
-for how this can be accomplished.
-
-There is no support yet for PCI devices that may be attached to a
-docking station, e.g. in the ThinkPad Dock II. The driver currently
-does not recognize, enable or disable such devices. This means that
-the only docking stations currently supported are the X-series
-UltraBase docks and "dumb" port replicators like the Mini Dock (the
-latter don't need any ACPI support, actually).
-
-
-UltraBay eject -- /proc/acpi/ibm/bay
-------------------------------------
-
-Inserting or ejecting an UltraBay device requires some actions to be
-taken by the operating system to safely make or break the electrical
-connections with the device.
-
-This feature generates the following ACPI events:
-
- ibm/bay MSTR 00000003 00000000 -- eject request
- ibm/bay MSTR 00000001 00000000 -- eject lever inserted
-
-NOTE: These events will only be generated if the UltraBay was present
-when the laptop was originally booted (on the X series, the UltraBay
-is in the dock, so it may not be present if the laptop was undocked).
-This is due to the current lack of support for hot plugging of devices
-in the Linux ACPI framework. If the laptop was booted without the
-UltraBay, the following message is shown in the logs:
-
- Mar 17 01:42:34 aero kernel: thinkpad_acpi: bay device not present
-
-In this case, no bay-related events are generated but the eject
-command described below still works. It can be executed manually or
-triggered by a hot key combination.
-
-Sliding the eject lever generates the first event shown above. The
-handler for this event should take whatever actions are necessary to
-shut down the device in the UltraBay (e.g. call idectl), then issue
-the following command:
-
- echo eject > /proc/acpi/ibm/bay
-
-After the LED on the UltraBay goes off, it is safe to pull out the
-device.
-
-When the eject lever is inserted, the second event above is
-generated. The handler for this event should take whatever actions are
-necessary to enable the UltraBay device (e.g. call idectl).
-
-The contents of the /proc/acpi/ibm/bay file shows the current status
-of the UltraBay, as provided by the ACPI framework.
-
-EXPERIMENTAL warm eject support on the 600e/x, A22p and A3x (To use
-this feature, you need to supply the experimental=1 parameter when
-loading the module):
-
-These models do not have a button near the UltraBay device to request
-a hot eject but rather require the laptop to be put to sleep
-(suspend-to-ram) before the bay device is ejected or inserted).
-The sequence of steps to eject the device is as follows:
-
- echo eject > /proc/acpi/ibm/bay
- put the ThinkPad to sleep
- remove the drive
- resume from sleep
- cat /proc/acpi/ibm/bay should show that the drive was removed
-
-On the A3x, both the UltraBay 2000 and UltraBay Plus devices are
-supported. Use "eject2" instead of "eject" for the second bay.
-
-Note: the UltraBay eject support on the 600e/x, A22p and A3x is
-EXPERIMENTAL and may not work as expected. USE WITH CAUTION!
-
-
CMOS/UCMS control
-----------------
@@ -920,7 +793,7 @@ The available commands are:
echo '<LED number> off' >/proc/acpi/ibm/led
echo '<LED number> blink' >/proc/acpi/ibm/led
-The <LED number> range is 0 to 7. The set of LEDs that can be
+The <LED number> range is 0 to 15. The set of LEDs that can be
controlled varies from model to model. Here is the common ThinkPad
mapping:
@@ -932,6 +805,11 @@ mapping:
5 - UltraBase battery slot
6 - (unknown)
7 - standby
+ 8 - dock status 1
+ 9 - dock status 2
+ 10, 11 - (unknown)
+ 12 - thinkvantage
+ 13, 14, 15 - (unknown)
All of the above can be turned on and off and can be made to blink.
@@ -940,10 +818,12 @@ sysfs notes:
The ThinkPad LED sysfs interface is described in detail by the LED class
documentation, in Documentation/leds-class.txt.
-The leds are named (in LED ID order, from 0 to 7):
+The LEDs are named (in LED ID order, from 0 to 12):
"tpacpi::power", "tpacpi:orange:batt", "tpacpi:green:batt",
"tpacpi::dock_active", "tpacpi::bay_active", "tpacpi::dock_batt",
-"tpacpi::unknown_led", "tpacpi::standby".
+"tpacpi::unknown_led", "tpacpi::standby", "tpacpi::dock_status1",
+"tpacpi::dock_status2", "tpacpi::unknown_led2", "tpacpi::unknown_led3",
+"tpacpi::thinkvantage".
Due to limitations in the sysfs LED class, if the status of the LED
indicators cannot be read due to an error, thinkpad-acpi will report it as
@@ -958,6 +838,12 @@ ThinkPad indicator LED should blink in hardware accelerated mode, use the
"timer" trigger, and leave the delay_on and delay_off parameters set to
zero (to request hardware acceleration autodetection).
+LEDs that are known not to exist in a given ThinkPad model are not
+made available through the sysfs interface. If you have a dock and you
+notice there are LEDs listed for your ThinkPad that do not exist (and
+are not in the dock), or if you notice that there are missing LEDs,
+a report to ibm-acpi-devel@lists.sourceforge.net is appreciated.
+
ACPI sounds -- /proc/acpi/ibm/beep
----------------------------------
@@ -1156,17 +1042,19 @@ may not be distinct. Later Lenovo models that implement the ACPI
display backlight brightness control methods have 16 levels, ranging
from 0 to 15.
-There are two interfaces to the firmware for direct brightness control,
-EC and UCMS (or CMOS). To select which one should be used, use the
-brightness_mode module parameter: brightness_mode=1 selects EC mode,
-brightness_mode=2 selects UCMS mode, brightness_mode=3 selects EC
-mode with NVRAM backing (so that brightness changes are remembered
-across shutdown/reboot).
+For IBM ThinkPads, there are two interfaces to the firmware for direct
+brightness control, EC and UCMS (or CMOS). To select which one should be
+used, use the brightness_mode module parameter: brightness_mode=1 selects
+EC mode, brightness_mode=2 selects UCMS mode, brightness_mode=3 selects EC
+mode with NVRAM backing (so that brightness changes are remembered across
+shutdown/reboot).
The driver tries to select which interface to use from a table of
defaults for each ThinkPad model. If it makes a wrong choice, please
report this as a bug, so that we can fix it.
+Lenovo ThinkPads only support brightness_mode=2 (UCMS).
+
When display backlight brightness controls are available through the
standard ACPI interface, it is best to use it instead of this direct
ThinkPad-specific interface. The driver will disable its native
@@ -1254,7 +1142,7 @@ Fan control and monitoring: fan speed, fan enable/disable
procfs: /proc/acpi/ibm/fan
sysfs device attributes: (hwmon "thinkpad") fan1_input, pwm1,
- pwm1_enable
+ pwm1_enable, fan2_input
sysfs hwmon driver attributes: fan_watchdog
NOTE NOTE NOTE: fan control operations are disabled by default for
@@ -1267,6 +1155,9 @@ from the hardware registers of the embedded controller. This is known
to work on later R, T, X and Z series ThinkPads but may show a bogus
value on other models.
+Some Lenovo ThinkPads support a secondary fan. This fan cannot be
+controlled separately, it shares the main fan control.
+
Fan levels:
Most ThinkPad fans work in "levels" at the firmware interface. Level 0
@@ -1397,6 +1288,11 @@ hwmon device attribute fan1_input:
which can take up to two minutes. May return rubbish on older
ThinkPads.
+hwmon device attribute fan2_input:
+ Fan tachometer reading, in RPM, for the secondary fan.
+ Available only on some ThinkPads. If the secondary fan is
+ not installed, will always read 0.
+
hwmon driver attribute fan_watchdog:
Fan safety watchdog timer interval, in seconds. Minimum is
1 second, maximum is 120 seconds. 0 disables the watchdog.
@@ -1555,3 +1451,7 @@ Sysfs interface changelog:
0x020300: hotkey enable/disable support removed, attributes
hotkey_bios_enabled and hotkey_enable deprecated and
marked for removal.
+
+0x020400: Marker for 16 LEDs support. Also, LEDs that are known
+ to not exist in a given model are not registered with
+ the LED sysfs class anymore.
diff --git a/Documentation/leds-lp3944.txt b/Documentation/leds-lp3944.txt
new file mode 100644
index 00000000000..c6eda18b15e
--- /dev/null
+++ b/Documentation/leds-lp3944.txt
@@ -0,0 +1,50 @@
+Kernel driver lp3944
+====================
+
+ * National Semiconductor LP3944 Fun-light Chip
+ Prefix: 'lp3944'
+ Addresses scanned: None (see the Notes section below)
+ Datasheet: Publicly available at the National Semiconductor website
+ http://www.national.com/pf/LP/LP3944.html
+
+Authors:
+ Antonio Ospite <ospite@studenti.unina.it>
+
+
+Description
+-----------
+The LP3944 is a helper chip that can drive up to 8 leds, with two programmable
+DIM modes; it could even be used as a gpio expander but this driver assumes it
+is used as a led controller.
+
+The DIM modes are used to set _blink_ patterns for leds, the pattern is
+specified supplying two parameters:
+ - period: from 0s to 1.6s
+ - duty cycle: percentage of the period the led is on, from 0 to 100
+
+Setting a led in DIM0 or DIM1 mode makes it blink according to the pattern.
+See the datasheet for details.
+
+LP3944 can be found on Motorola A910 smartphone, where it drives the rgb
+leds, the camera flash light and the lcds power.
+
+
+Notes
+-----
+The chip is used mainly in embedded contexts, so this driver expects it is
+registered using the i2c_board_info mechanism.
+
+To register the chip at address 0x60 on adapter 0, set the platform data
+according to include/linux/leds-lp3944.h, set the i2c board info:
+
+ static struct i2c_board_info __initdata a910_i2c_board_info[] = {
+ {
+ I2C_BOARD_INFO("lp3944", 0x60),
+ .platform_data = &a910_lp3944_leds,
+ },
+ };
+
+and register it in the platform init function
+
+ i2c_register_board_info(0, a910_i2c_board_info,
+ ARRAY_SIZE(a910_i2c_board_info));
diff --git a/Documentation/lguest/Makefile b/Documentation/lguest/Makefile
index 1f4f9e888bd..28c8cdfcafd 100644
--- a/Documentation/lguest/Makefile
+++ b/Documentation/lguest/Makefile
@@ -1,6 +1,5 @@
# This creates the demonstration utility "lguest" which runs a Linux guest.
-CFLAGS:=-Wall -Wmissing-declarations -Wmissing-prototypes -O3 -I../../include -I../../arch/x86/include -U_FORTIFY_SOURCE
-LDLIBS:=-lz
+CFLAGS:=-m32 -Wall -Wmissing-declarations -Wmissing-prototypes -O3 -I../../include -I../../arch/x86/include -U_FORTIFY_SOURCE
all: lguest
diff --git a/Documentation/lguest/lguest.c b/Documentation/lguest/lguest.c
index d36fcc0f271..950cde6d6e5 100644
--- a/Documentation/lguest/lguest.c
+++ b/Documentation/lguest/lguest.c
@@ -1,7 +1,9 @@
-/*P:100 This is the Launcher code, a simple program which lays out the
- * "physical" memory for the new Guest by mapping the kernel image and
- * the virtual devices, then opens /dev/lguest to tell the kernel
- * about the Guest and control it. :*/
+/*P:100
+ * This is the Launcher code, a simple program which lays out the "physical"
+ * memory for the new Guest by mapping the kernel image and the virtual
+ * devices, then opens /dev/lguest to tell the kernel about the Guest and
+ * control it.
+:*/
#define _LARGEFILE64_SOURCE
#define _GNU_SOURCE
#include <stdio.h>
@@ -16,6 +18,7 @@
#include <sys/types.h>
#include <sys/stat.h>
#include <sys/wait.h>
+#include <sys/eventfd.h>
#include <fcntl.h>
#include <stdbool.h>
#include <errno.h>
@@ -45,13 +48,15 @@
#include "linux/virtio_rng.h"
#include "linux/virtio_ring.h"
#include "asm/bootparam.h"
-/*L:110 We can ignore the 39 include files we need for this program, but I do
- * want to draw attention to the use of kernel-style types.
+/*L:110
+ * We can ignore the 42 include files we need for this program, but I do want
+ * to draw attention to the use of kernel-style types.
*
* As Linus said, "C is a Spartan language, and so should your naming be." I
* like these abbreviations, so we define them here. Note that u64 is always
* unsigned long long, which works on all Linux systems: this means that we can
- * use %llu in printf for any u64. */
+ * use %llu in printf for any u64.
+ */
typedef unsigned long long u64;
typedef uint32_t u32;
typedef uint16_t u16;
@@ -59,7 +64,6 @@ typedef uint8_t u8;
/*:*/
#define PAGE_PRESENT 0x7 /* Present, RW, Execute */
-#define NET_PEERNUM 1
#define BRIDGE_PFX "bridge:"
#ifndef SIOCBRADDIF
#define SIOCBRADDIF 0x89a2 /* add interface to bridge */
@@ -69,38 +73,27 @@ typedef uint8_t u8;
/* This will occupy 3 pages: it must be a power of 2. */
#define VIRTQUEUE_NUM 256
-/*L:120 verbose is both a global flag and a macro. The C preprocessor allows
- * this, and although I wouldn't recommend it, it works quite nicely here. */
+/*L:120
+ * verbose is both a global flag and a macro. The C preprocessor allows
+ * this, and although I wouldn't recommend it, it works quite nicely here.
+ */
static bool verbose;
#define verbose(args...) \
do { if (verbose) printf(args); } while(0)
/*:*/
-/* File descriptors for the Waker. */
-struct {
- int pipe[2];
- int lguest_fd;
-} waker_fds;
-
/* The pointer to the start of guest memory. */
static void *guest_base;
/* The maximum guest physical address allowed, and maximum possible. */
static unsigned long guest_limit, guest_max;
-/* The pipe for signal hander to write to. */
-static int timeoutpipe[2];
-static unsigned int timeout_usec = 500;
+/* The /dev/lguest file descriptor. */
+static int lguest_fd;
/* a per-cpu variable indicating whose vcpu is currently running */
static unsigned int __thread cpu_id;
/* This is our list of devices. */
-struct device_list
-{
- /* Summary information about the devices in our list: ready to pass to
- * select() to ask which need servicing.*/
- fd_set infds;
- int max_infd;
-
+struct device_list {
/* Counter to assign interrupt numbers. */
unsigned int next_irq;
@@ -112,8 +105,7 @@ struct device_list
/* A single linked list of devices. */
struct device *dev;
- /* And a pointer to the last device for easy append and also for
- * configuration appending. */
+ /* And a pointer to the last device for easy append. */
struct device *lastdev;
};
@@ -121,35 +113,32 @@ struct device_list
static struct device_list devices;
/* The device structure describes a single device. */
-struct device
-{
+struct device {
/* The linked-list pointer. */
struct device *next;
- /* The this device's descriptor, as mapped into the Guest. */
+ /* The device's descriptor, as mapped into the Guest. */
struct lguest_device_desc *desc;
+ /* We can't trust desc values once Guest has booted: we use these. */
+ unsigned int feature_len;
+ unsigned int num_vq;
+
/* The name of this device, for --verbose. */
const char *name;
- /* If handle_input is set, it wants to be called when this file
- * descriptor is ready. */
- int fd;
- bool (*handle_input)(int fd, struct device *me);
-
/* Any queues attached to this device */
struct virtqueue *vq;
- /* Handle status being finalized (ie. feature bits stable). */
- void (*ready)(struct device *me);
+ /* Is it operational */
+ bool running;
/* Device-specific data. */
void *priv;
};
/* The virtqueue structure describes a queue attached to a device. */
-struct virtqueue
-{
+struct virtqueue {
struct virtqueue *next;
/* Which device owns me. */
@@ -164,31 +153,41 @@ struct virtqueue
/* Last available index we saw. */
u16 last_avail_idx;
- /* The routine to call when the Guest pings us, or timeout. */
- void (*handle_output)(int fd, struct virtqueue *me, bool timeout);
+ /* How many are used since we sent last irq? */
+ unsigned int pending_used;
- /* Outstanding buffers */
- unsigned int inflight;
+ /* Eventfd where Guest notifications arrive. */
+ int eventfd;
- /* Is this blocked awaiting a timer? */
- bool blocked;
+ /* Function for the thread which is servicing this virtqueue. */
+ void (*service)(struct virtqueue *vq);
+ pid_t thread;
};
/* Remember the arguments to the program so we can "reboot" */
static char **main_args;
-/* Since guest is UP and we don't run at the same time, we don't need barriers.
- * But I include them in the code in case others copy it. */
-#define wmb()
+/* The original tty settings to restore on exit. */
+static struct termios orig_term;
-/* Convert an iovec element to the given type.
+/*
+ * We have to be careful with barriers: our devices are all run in separate
+ * threads and so we need to make sure that changes visible to the Guest happen
+ * in precise order.
+ */
+#define wmb() __asm__ __volatile__("" : : : "memory")
+#define mb() __asm__ __volatile__("" : : : "memory")
+
+/*
+ * Convert an iovec element to the given type.
*
* This is a fairly ugly trick: we need to know the size of the type and
* alignment requirement to check the pointer is kosher. It's also nice to
* have the name of the type in case we report failure.
*
* Typing those three things all the time is cumbersome and error prone, so we
- * have a macro which sets them all up and passes to the real function. */
+ * have a macro which sets them all up and passes to the real function.
+ */
#define convert(iov, type) \
((type *)_convert((iov), sizeof(type), __alignof__(type), #type))
@@ -205,8 +204,10 @@ static void *_convert(struct iovec *iov, size_t size, size_t align,
/* Wrapper for the last available index. Makes it easier to change. */
#define lg_last_avail(vq) ((vq)->last_avail_idx)
-/* The virtio configuration space is defined to be little-endian. x86 is
- * little-endian too, but it's nice to be explicit so we have these helpers. */
+/*
+ * The virtio configuration space is defined to be little-endian. x86 is
+ * little-endian too, but it's nice to be explicit so we have these helpers.
+ */
#define cpu_to_le16(v16) (v16)
#define cpu_to_le32(v32) (v32)
#define cpu_to_le64(v64) (v64)
@@ -245,14 +246,15 @@ static void iov_consume(struct iovec iov[], unsigned num_iov, unsigned len)
static u8 *get_feature_bits(struct device *dev)
{
return (u8 *)(dev->desc + 1)
- + dev->desc->num_vq * sizeof(struct lguest_vqconfig);
+ + dev->num_vq * sizeof(struct lguest_vqconfig);
}
-/*L:100 The Launcher code itself takes us out into userspace, that scary place
- * where pointers run wild and free! Unfortunately, like most userspace
- * programs, it's quite boring (which is why everyone likes to hack on the
- * kernel!). Perhaps if you make up an Lguest Drinking Game at this point, it
- * will get you through this section. Or, maybe not.
+/*L:100
+ * The Launcher code itself takes us out into userspace, that scary place where
+ * pointers run wild and free! Unfortunately, like most userspace programs,
+ * it's quite boring (which is why everyone likes to hack on the kernel!).
+ * Perhaps if you make up an Lguest Drinking Game at this point, it will get
+ * you through this section. Or, maybe not.
*
* The Launcher sets up a big chunk of memory to be the Guest's "physical"
* memory and stores it in "guest_base". In other words, Guest physical ==
@@ -260,7 +262,8 @@ static u8 *get_feature_bits(struct device *dev)
*
* This can be tough to get your head around, but usually it just means that we
* use these trivial conversion functions when the Guest gives us it's
- * "physical" addresses: */
+ * "physical" addresses:
+ */
static void *from_guest_phys(unsigned long addr)
{
return guest_base + addr;
@@ -275,7 +278,8 @@ static unsigned long to_guest_phys(const void *addr)
* Loading the Kernel.
*
* We start with couple of simple helper routines. open_or_die() avoids
- * error-checking code cluttering the callers: */
+ * error-checking code cluttering the callers:
+ */
static int open_or_die(const char *name, int flags)
{
int fd = open(name, flags);
@@ -290,12 +294,19 @@ static void *map_zeroed_pages(unsigned int num)
int fd = open_or_die("/dev/zero", O_RDONLY);
void *addr;
- /* We use a private mapping (ie. if we write to the page, it will be
- * copied). */
+ /*
+ * We use a private mapping (ie. if we write to the page, it will be
+ * copied).
+ */
addr = mmap(NULL, getpagesize() * num,
PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE, fd, 0);
if (addr == MAP_FAILED)
err(1, "Mmaping %u pages of /dev/zero", num);
+
+ /*
+ * One neat mmap feature is that you can close the fd, and it
+ * stays mapped.
+ */
close(fd);
return addr;
@@ -312,20 +323,24 @@ static void *get_pages(unsigned int num)
return addr;
}
-/* This routine is used to load the kernel or initrd. It tries mmap, but if
+/*
+ * This routine is used to load the kernel or initrd. It tries mmap, but if
* that fails (Plan 9's kernel file isn't nicely aligned on page boundaries),
- * it falls back to reading the memory in. */
+ * it falls back to reading the memory in.
+ */
static void map_at(int fd, void *addr, unsigned long offset, unsigned long len)
{
ssize_t r;
- /* We map writable even though for some segments are marked read-only.
+ /*
+ * We map writable even though for some segments are marked read-only.
* The kernel really wants to be writable: it patches its own
* instructions.
*
* MAP_PRIVATE means that the page won't be copied until a write is
* done to it. This allows us to share untouched memory between
- * Guests. */
+ * Guests.
+ */
if (mmap(addr, len, PROT_READ|PROT_WRITE|PROT_EXEC,
MAP_FIXED|MAP_PRIVATE, fd, offset) != MAP_FAILED)
return;
@@ -336,7 +351,8 @@ static void map_at(int fd, void *addr, unsigned long offset, unsigned long len)
err(1, "Reading offset %lu len %lu gave %zi", offset, len, r);
}
-/* This routine takes an open vmlinux image, which is in ELF, and maps it into
+/*
+ * This routine takes an open vmlinux image, which is in ELF, and maps it into
* the Guest memory. ELF = Embedded Linking Format, which is the format used
* by all modern binaries on Linux including the kernel.
*
@@ -344,23 +360,28 @@ static void map_at(int fd, void *addr, unsigned long offset, unsigned long len)
* address. We use the physical address; the Guest will map itself to the
* virtual address.
*
- * We return the starting address. */
+ * We return the starting address.
+ */
static unsigned long map_elf(int elf_fd, const Elf32_Ehdr *ehdr)
{
Elf32_Phdr phdr[ehdr->e_phnum];
unsigned int i;
- /* Sanity checks on the main ELF header: an x86 executable with a
- * reasonable number of correctly-sized program headers. */
+ /*
+ * Sanity checks on the main ELF header: an x86 executable with a
+ * reasonable number of correctly-sized program headers.
+ */
if (ehdr->e_type != ET_EXEC
|| ehdr->e_machine != EM_386
|| ehdr->e_phentsize != sizeof(Elf32_Phdr)
|| ehdr->e_phnum < 1 || ehdr->e_phnum > 65536U/sizeof(Elf32_Phdr))
errx(1, "Malformed elf header");
- /* An ELF executable contains an ELF header and a number of "program"
+ /*
+ * An ELF executable contains an ELF header and a number of "program"
* headers which indicate which parts ("segments") of the program to
- * load where. */
+ * load where.
+ */
/* We read in all the program headers at once: */
if (lseek(elf_fd, ehdr->e_phoff, SEEK_SET) < 0)
@@ -368,8 +389,10 @@ static unsigned long map_elf(int elf_fd, const Elf32_Ehdr *ehdr)
if (read(elf_fd, phdr, sizeof(phdr)) != sizeof(phdr))
err(1, "Reading program headers");
- /* Try all the headers: there are usually only three. A read-only one,
- * a read-write one, and a "note" section which we don't load. */
+ /*
+ * Try all the headers: there are usually only three. A read-only one,
+ * a read-write one, and a "note" section which we don't load.
+ */
for (i = 0; i < ehdr->e_phnum; i++) {
/* If this isn't a loadable segment, we ignore it */
if (phdr[i].p_type != PT_LOAD)
@@ -387,13 +410,15 @@ static unsigned long map_elf(int elf_fd, const Elf32_Ehdr *ehdr)
return ehdr->e_entry;
}
-/*L:150 A bzImage, unlike an ELF file, is not meant to be loaded. You're
- * supposed to jump into it and it will unpack itself. We used to have to
- * perform some hairy magic because the unpacking code scared me.
+/*L:150
+ * A bzImage, unlike an ELF file, is not meant to be loaded. You're supposed
+ * to jump into it and it will unpack itself. We used to have to perform some
+ * hairy magic because the unpacking code scared me.
*
* Fortunately, Jeremy Fitzhardinge convinced me it wasn't that hard and wrote
* a small patch to jump over the tricky bits in the Guest, so now we just read
- * the funky header so we know where in the file to load, and away we go! */
+ * the funky header so we know where in the file to load, and away we go!
+ */
static unsigned long load_bzimage(int fd)
{
struct boot_params boot;
@@ -401,8 +426,10 @@ static unsigned long load_bzimage(int fd)
/* Modern bzImages get loaded at 1M. */
void *p = from_guest_phys(0x100000);
- /* Go back to the start of the file and read the header. It should be
- * a Linux boot header (see Documentation/x86/i386/boot.txt) */
+ /*
+ * Go back to the start of the file and read the header. It should be
+ * a Linux boot header (see Documentation/x86/i386/boot.txt)
+ */
lseek(fd, 0, SEEK_SET);
read(fd, &boot, sizeof(boot));
@@ -421,9 +448,11 @@ static unsigned long load_bzimage(int fd)
return boot.hdr.code32_start;
}
-/*L:140 Loading the kernel is easy when it's a "vmlinux", but most kernels
+/*L:140
+ * Loading the kernel is easy when it's a "vmlinux", but most kernels
* come wrapped up in the self-decompressing "bzImage" format. With a little
- * work, we can load those, too. */
+ * work, we can load those, too.
+ */
static unsigned long load_kernel(int fd)
{
Elf32_Ehdr hdr;
@@ -440,24 +469,28 @@ static unsigned long load_kernel(int fd)
return load_bzimage(fd);
}
-/* This is a trivial little helper to align pages. Andi Kleen hated it because
+/*
+ * This is a trivial little helper to align pages. Andi Kleen hated it because
* it calls getpagesize() twice: "it's dumb code."
*
* Kernel guys get really het up about optimization, even when it's not
- * necessary. I leave this code as a reaction against that. */
+ * necessary. I leave this code as a reaction against that.
+ */
static inline unsigned long page_align(unsigned long addr)
{
/* Add upwards and truncate downwards. */
return ((addr + getpagesize()-1) & ~(getpagesize()-1));
}
-/*L:180 An "initial ram disk" is a disk image loaded into memory along with
- * the kernel which the kernel can use to boot from without needing any
- * drivers. Most distributions now use this as standard: the initrd contains
- * the code to load the appropriate driver modules for the current machine.
+/*L:180
+ * An "initial ram disk" is a disk image loaded into memory along with the
+ * kernel which the kernel can use to boot from without needing any drivers.
+ * Most distributions now use this as standard: the initrd contains the code to
+ * load the appropriate driver modules for the current machine.
*
* Importantly, James Morris works for RedHat, and Fedora uses initrds for its
- * kernels. He sent me this (and tells me when I break it). */
+ * kernels. He sent me this (and tells me when I break it).
+ */
static unsigned long load_initrd(const char *name, unsigned long mem)
{
int ifd;
@@ -469,12 +502,16 @@ static unsigned long load_initrd(const char *name, unsigned long mem)
if (fstat(ifd, &st) < 0)
err(1, "fstat() on initrd '%s'", name);
- /* We map the initrd at the top of memory, but mmap wants it to be
- * page-aligned, so we round the size up for that. */
+ /*
+ * We map the initrd at the top of memory, but mmap wants it to be
+ * page-aligned, so we round the size up for that.
+ */
len = page_align(st.st_size);
map_at(ifd, from_guest_phys(mem - len), 0, st.st_size);
- /* Once a file is mapped, you can close the file descriptor. It's a
- * little odd, but quite useful. */
+ /*
+ * Once a file is mapped, you can close the file descriptor. It's a
+ * little odd, but quite useful.
+ */
close(ifd);
verbose("mapped initrd %s size=%lu @ %p\n", name, len, (void*)mem-len);
@@ -483,8 +520,10 @@ static unsigned long load_initrd(const char *name, unsigned long mem)
}
/*:*/
-/* Simple routine to roll all the commandline arguments together with spaces
- * between them. */
+/*
+ * Simple routine to roll all the commandline arguments together with spaces
+ * between them.
+ */
static void concat(char *dst, char *args[])
{
unsigned int i, len = 0;
@@ -501,104 +540,26 @@ static void concat(char *dst, char *args[])
dst[len] = '\0';
}
-/*L:185 This is where we actually tell the kernel to initialize the Guest. We
+/*L:185
+ * This is where we actually tell the kernel to initialize the Guest. We
* saw the arguments it expects when we looked at initialize() in lguest_user.c:
* the base of Guest "physical" memory, the top physical page to allow and the
- * entry point for the Guest. */
-static int tell_kernel(unsigned long start)
+ * entry point for the Guest.
+ */
+static void tell_kernel(unsigned long start)
{
unsigned long args[] = { LHREQ_INITIALIZE,
(unsigned long)guest_base,
guest_limit / getpagesize(), start };
- int fd;
-
verbose("Guest: %p - %p (%#lx)\n",
guest_base, guest_base + guest_limit, guest_limit);
- fd = open_or_die("/dev/lguest", O_RDWR);
- if (write(fd, args, sizeof(args)) < 0)
+ lguest_fd = open_or_die("/dev/lguest", O_RDWR);
+ if (write(lguest_fd, args, sizeof(args)) < 0)
err(1, "Writing to /dev/lguest");
-
- /* We return the /dev/lguest file descriptor to control this Guest */
- return fd;
}
/*:*/
-static void add_device_fd(int fd)
-{
- FD_SET(fd, &devices.infds);
- if (fd > devices.max_infd)
- devices.max_infd = fd;
-}
-
/*L:200
- * The Waker.
- *
- * With console, block and network devices, we can have lots of input which we
- * need to process. We could try to tell the kernel what file descriptors to
- * watch, but handing a file descriptor mask through to the kernel is fairly
- * icky.
- *
- * Instead, we clone off a thread which watches the file descriptors and writes
- * the LHREQ_BREAK command to the /dev/lguest file descriptor to tell the Host
- * stop running the Guest. This causes the Launcher to return from the
- * /dev/lguest read with -EAGAIN, where it will write to /dev/lguest to reset
- * the LHREQ_BREAK and wake us up again.
- *
- * This, of course, is merely a different *kind* of icky.
- *
- * Given my well-known antipathy to threads, I'd prefer to use processes. But
- * it's easier to share Guest memory with threads, and trivial to share the
- * devices.infds as the Launcher changes it.
- */
-static int waker(void *unused)
-{
- /* Close the write end of the pipe: only the Launcher has it open. */
- close(waker_fds.pipe[1]);
-
- for (;;) {
- fd_set rfds = devices.infds;
- unsigned long args[] = { LHREQ_BREAK, 1 };
- unsigned int maxfd = devices.max_infd;
-
- /* We also listen to the pipe from the Launcher. */
- FD_SET(waker_fds.pipe[0], &rfds);
- if (waker_fds.pipe[0] > maxfd)
- maxfd = waker_fds.pipe[0];
-
- /* Wait until input is ready from one of the devices. */
- select(maxfd+1, &rfds, NULL, NULL, NULL);
-
- /* Message from Launcher? */
- if (FD_ISSET(waker_fds.pipe[0], &rfds)) {
- char c;
- /* If this fails, then assume Launcher has exited.
- * Don't do anything on exit: we're just a thread! */
- if (read(waker_fds.pipe[0], &c, 1) != 1)
- _exit(0);
- continue;
- }
-
- /* Send LHREQ_BREAK command to snap the Launcher out of it. */
- pwrite(waker_fds.lguest_fd, args, sizeof(args), cpu_id);
- }
- return 0;
-}
-
-/* This routine just sets up a pipe to the Waker process. */
-static void setup_waker(int lguest_fd)
-{
- /* This pipe is closed when Launcher dies, telling Waker. */
- if (pipe(waker_fds.pipe) != 0)
- err(1, "Creating pipe for Waker");
-
- /* Waker also needs to know the lguest fd */
- waker_fds.lguest_fd = lguest_fd;
-
- if (clone(waker, malloc(4096) + 4096, CLONE_VM | SIGCHLD, NULL) == -1)
- err(1, "Creating Waker");
-}
-
-/*
* Device Handling.
*
* When the Guest gives us a buffer, it sends an array of addresses and sizes.
@@ -609,65 +570,125 @@ static void setup_waker(int lguest_fd)
static void *_check_pointer(unsigned long addr, unsigned int size,
unsigned int line)
{
- /* We have to separately check addr and addr+size, because size could
- * be huge and addr + size might wrap around. */
+ /*
+ * We have to separately check addr and addr+size, because size could
+ * be huge and addr + size might wrap around.
+ */
if (addr >= guest_limit || addr + size >= guest_limit)
errx(1, "%s:%i: Invalid address %#lx", __FILE__, line, addr);
- /* We return a pointer for the caller's convenience, now we know it's
- * safe to use. */
+ /*
+ * We return a pointer for the caller's convenience, now we know it's
+ * safe to use.
+ */
return from_guest_phys(addr);
}
/* A macro which transparently hands the line number to the real function. */
#define check_pointer(addr,size) _check_pointer(addr, size, __LINE__)
-/* Each buffer in the virtqueues is actually a chain of descriptors. This
+/*
+ * Each buffer in the virtqueues is actually a chain of descriptors. This
* function returns the next descriptor in the chain, or vq->vring.num if we're
- * at the end. */
-static unsigned next_desc(struct virtqueue *vq, unsigned int i)
+ * at the end.
+ */
+static unsigned next_desc(struct vring_desc *desc,
+ unsigned int i, unsigned int max)
{
unsigned int next;
/* If this descriptor says it doesn't chain, we're done. */
- if (!(vq->vring.desc[i].flags & VRING_DESC_F_NEXT))
- return vq->vring.num;
+ if (!(desc[i].flags & VRING_DESC_F_NEXT))
+ return max;
/* Check they're not leading us off end of descriptors. */
- next = vq->vring.desc[i].next;
+ next = desc[i].next;
/* Make sure compiler knows to grab that: we don't want it changing! */
wmb();
- if (next >= vq->vring.num)
+ if (next >= max)
errx(1, "Desc next is %u", next);
return next;
}
-/* This looks in the virtqueue and for the first available buffer, and converts
+/*
+ * This actually sends the interrupt for this virtqueue, if we've used a
+ * buffer.
+ */
+static void trigger_irq(struct virtqueue *vq)
+{
+ unsigned long buf[] = { LHREQ_IRQ, vq->config.irq };
+
+ /* Don't inform them if nothing used. */
+ if (!vq->pending_used)
+ return;
+ vq->pending_used = 0;
+
+ /* If they don't want an interrupt, don't send one, unless empty. */
+ if ((vq->vring.avail->flags & VRING_AVAIL_F_NO_INTERRUPT)
+ && lg_last_avail(vq) != vq->vring.avail->idx)
+ return;
+
+ /* Send the Guest an interrupt tell them we used something up. */
+ if (write(lguest_fd, buf, sizeof(buf)) != 0)
+ err(1, "Triggering irq %i", vq->config.irq);
+}
+
+/*
+ * This looks in the virtqueue for the first available buffer, and converts
* it to an iovec for convenient access. Since descriptors consist of some
* number of output then some number of input descriptors, it's actually two
* iovecs, but we pack them into one and note how many of each there were.
*
- * This function returns the descriptor number found, or vq->vring.num (which
- * is never a valid descriptor number) if none was found. */
-static unsigned get_vq_desc(struct virtqueue *vq,
- struct iovec iov[],
- unsigned int *out_num, unsigned int *in_num)
-{
- unsigned int i, head;
- u16 last_avail;
+ * This function waits if necessary, and returns the descriptor number found.
+ */
+static unsigned wait_for_vq_desc(struct virtqueue *vq,
+ struct iovec iov[],
+ unsigned int *out_num, unsigned int *in_num)
+{
+ unsigned int i, head, max;
+ struct vring_desc *desc;
+ u16 last_avail = lg_last_avail(vq);
+
+ /* There's nothing available? */
+ while (last_avail == vq->vring.avail->idx) {
+ u64 event;
+
+ /*
+ * Since we're about to sleep, now is a good time to tell the
+ * Guest about what we've used up to now.
+ */
+ trigger_irq(vq);
+
+ /* OK, now we need to know about added descriptors. */
+ vq->vring.used->flags &= ~VRING_USED_F_NO_NOTIFY;
+
+ /*
+ * They could have slipped one in as we were doing that: make
+ * sure it's written, then check again.
+ */
+ mb();
+ if (last_avail != vq->vring.avail->idx) {
+ vq->vring.used->flags |= VRING_USED_F_NO_NOTIFY;
+ break;
+ }
+
+ /* Nothing new? Wait for eventfd to tell us they refilled. */
+ if (read(vq->eventfd, &event, sizeof(event)) != sizeof(event))
+ errx(1, "Event read failed?");
+
+ /* We don't need to be notified again. */
+ vq->vring.used->flags |= VRING_USED_F_NO_NOTIFY;
+ }
/* Check it isn't doing very strange things with descriptor numbers. */
- last_avail = lg_last_avail(vq);
if ((u16)(vq->vring.avail->idx - last_avail) > vq->vring.num)
errx(1, "Guest moved used index from %u to %u",
last_avail, vq->vring.avail->idx);
- /* If there's nothing new since last we looked, return invalid. */
- if (vq->vring.avail->idx == last_avail)
- return vq->vring.num;
-
- /* Grab the next descriptor number they're advertising, and increment
- * the index we've seen. */
+ /*
+ * Grab the next descriptor number they're advertising, and increment
+ * the index we've seen.
+ */
head = vq->vring.avail->ring[last_avail % vq->vring.num];
lg_last_avail(vq)++;
@@ -678,87 +699,84 @@ static unsigned get_vq_desc(struct virtqueue *vq,
/* When we start there are none of either input nor output. */
*out_num = *in_num = 0;
+ max = vq->vring.num;
+ desc = vq->vring.desc;
i = head;
+
+ /*
+ * If this is an indirect entry, then this buffer contains a descriptor
+ * table which we handle as if it's any normal descriptor chain.
+ */
+ if (desc[i].flags & VRING_DESC_F_INDIRECT) {
+ if (desc[i].len % sizeof(struct vring_desc))
+ errx(1, "Invalid size for indirect buffer table");
+
+ max = desc[i].len / sizeof(struct vring_desc);
+ desc = check_pointer(desc[i].addr, desc[i].len);
+ i = 0;
+ }
+
do {
/* Grab the first descriptor, and check it's OK. */
- iov[*out_num + *in_num].iov_len = vq->vring.desc[i].len;
+ iov[*out_num + *in_num].iov_len = desc[i].len;
iov[*out_num + *in_num].iov_base
- = check_pointer(vq->vring.desc[i].addr,
- vq->vring.desc[i].len);
+ = check_pointer(desc[i].addr, desc[i].len);
/* If this is an input descriptor, increment that count. */
- if (vq->vring.desc[i].flags & VRING_DESC_F_WRITE)
+ if (desc[i].flags & VRING_DESC_F_WRITE)
(*in_num)++;
else {
- /* If it's an output descriptor, they're all supposed
- * to come before any input descriptors. */
+ /*
+ * If it's an output descriptor, they're all supposed
+ * to come before any input descriptors.
+ */
if (*in_num)
errx(1, "Descriptor has out after in");
(*out_num)++;
}
/* If we've got too many, that implies a descriptor loop. */
- if (*out_num + *in_num > vq->vring.num)
+ if (*out_num + *in_num > max)
errx(1, "Looped descriptor");
- } while ((i = next_desc(vq, i)) != vq->vring.num);
+ } while ((i = next_desc(desc, i, max)) != max);
- vq->inflight++;
return head;
}
-/* After we've used one of their buffers, we tell them about it. We'll then
- * want to send them an interrupt, using trigger_irq(). */
+/*
+ * After we've used one of their buffers, we tell the Guest about it. Sometime
+ * later we'll want to send them an interrupt using trigger_irq(); note that
+ * wait_for_vq_desc() does that for us if it has to wait.
+ */
static void add_used(struct virtqueue *vq, unsigned int head, int len)
{
struct vring_used_elem *used;
- /* The virtqueue contains a ring of used buffers. Get a pointer to the
- * next entry in that used ring. */
+ /*
+ * The virtqueue contains a ring of used buffers. Get a pointer to the
+ * next entry in that used ring.
+ */
used = &vq->vring.used->ring[vq->vring.used->idx % vq->vring.num];
used->id = head;
used->len = len;
/* Make sure buffer is written before we update index. */
wmb();
vq->vring.used->idx++;
- vq->inflight--;
-}
-
-/* This actually sends the interrupt for this virtqueue */
-static void trigger_irq(int fd, struct virtqueue *vq)
-{
- unsigned long buf[] = { LHREQ_IRQ, vq->config.irq };
-
- /* If they don't want an interrupt, don't send one, unless empty. */
- if ((vq->vring.avail->flags & VRING_AVAIL_F_NO_INTERRUPT)
- && vq->inflight)
- return;
-
- /* Send the Guest an interrupt tell them we used something up. */
- if (write(fd, buf, sizeof(buf)) != 0)
- err(1, "Triggering irq %i", vq->config.irq);
+ vq->pending_used++;
}
/* And here's the combo meal deal. Supersize me! */
-static void add_used_and_trigger(int fd, struct virtqueue *vq,
- unsigned int head, int len)
+static void add_used_and_trigger(struct virtqueue *vq, unsigned head, int len)
{
add_used(vq, head, len);
- trigger_irq(fd, vq);
+ trigger_irq(vq);
}
/*
* The Console
*
- * Here is the input terminal setting we save, and the routine to restore them
- * on exit so the user gets their terminal back. */
-static struct termios orig_term;
-static void restore_term(void)
-{
- tcsetattr(STDIN_FILENO, TCSANOW, &orig_term);
-}
-
-/* We associate some data with the console for our exit hack. */
-struct console_abort
-{
+ * We associate some data with the console for our exit hack.
+ */
+struct console_abort {
/* How many times have they hit ^C? */
int count;
/* When did they start? */
@@ -766,282 +784,360 @@ struct console_abort
};
/* This is the routine which handles console input (ie. stdin). */
-static bool handle_console_input(int fd, struct device *dev)
+static void console_input(struct virtqueue *vq)
{
int len;
unsigned int head, in_num, out_num;
- struct iovec iov[dev->vq->vring.num];
- struct console_abort *abort = dev->priv;
-
- /* First we need a console buffer from the Guests's input virtqueue. */
- head = get_vq_desc(dev->vq, iov, &out_num, &in_num);
-
- /* If they're not ready for input, stop listening to this file
- * descriptor. We'll start again once they add an input buffer. */
- if (head == dev->vq->vring.num)
- return false;
+ struct console_abort *abort = vq->dev->priv;
+ struct iovec iov[vq->vring.num];
+ /* Make sure there's a descriptor available. */
+ head = wait_for_vq_desc(vq, iov, &out_num, &in_num);
if (out_num)
errx(1, "Output buffers in console in queue?");
- /* This is why we convert to iovecs: the readv() call uses them, and so
- * it reads straight into the Guest's buffer. */
- len = readv(dev->fd, iov, in_num);
+ /* Read into it. This is where we usually wait. */
+ len = readv(STDIN_FILENO, iov, in_num);
if (len <= 0) {
- /* This implies that the console is closed, is /dev/null, or
- * something went terribly wrong. */
+ /* Ran out of input? */
warnx("Failed to get console input, ignoring console.");
- /* Put the input terminal back. */
- restore_term();
- /* Remove callback from input vq, so it doesn't restart us. */
- dev->vq->handle_output = NULL;
- /* Stop listening to this fd: don't call us again. */
- return false;
+ /*
+ * For simplicity, dying threads kill the whole Launcher. So
+ * just nap here.
+ */
+ for (;;)
+ pause();
}
- /* Tell the Guest about the new input. */
- add_used_and_trigger(fd, dev->vq, head, len);
+ /* Tell the Guest we used a buffer. */
+ add_used_and_trigger(vq, head, len);
- /* Three ^C within one second? Exit.
+ /*
+ * Three ^C within one second? Exit.
*
- * This is such a hack, but works surprisingly well. Each ^C has to be
- * in a buffer by itself, so they can't be too fast. But we check that
- * we get three within about a second, so they can't be too slow. */
- if (len == 1 && ((char *)iov[0].iov_base)[0] == 3) {
- if (!abort->count++)
- gettimeofday(&abort->start, NULL);
- else if (abort->count == 3) {
- struct timeval now;
- gettimeofday(&now, NULL);
- if (now.tv_sec <= abort->start.tv_sec+1) {
- unsigned long args[] = { LHREQ_BREAK, 0 };
- /* Close the fd so Waker will know it has to
- * exit. */
- close(waker_fds.pipe[1]);
- /* Just in case Waker is blocked in BREAK, send
- * unbreak now. */
- write(fd, args, sizeof(args));
- exit(2);
- }
- abort->count = 0;
- }
- } else
- /* Any other key resets the abort counter. */
+ * This is such a hack, but works surprisingly well. Each ^C has to
+ * be in a buffer by itself, so they can't be too fast. But we check
+ * that we get three within about a second, so they can't be too
+ * slow.
+ */
+ if (len != 1 || ((char *)iov[0].iov_base)[0] != 3) {
abort->count = 0;
+ return;
+ }
- /* Everything went OK! */
- return true;
+ abort->count++;
+ if (abort->count == 1)
+ gettimeofday(&abort->start, NULL);
+ else if (abort->count == 3) {
+ struct timeval now;
+ gettimeofday(&now, NULL);
+ /* Kill all Launcher processes with SIGINT, like normal ^C */
+ if (now.tv_sec <= abort->start.tv_sec+1)
+ kill(0, SIGINT);
+ abort->count = 0;
+ }
}
-/* Handling output for console is simple: we just get all the output buffers
- * and write them to stdout. */
-static void handle_console_output(int fd, struct virtqueue *vq, bool timeout)
+/* This is the routine which handles console output (ie. stdout). */
+static void console_output(struct virtqueue *vq)
{
unsigned int head, out, in;
- int len;
struct iovec iov[vq->vring.num];
- /* Keep getting output buffers from the Guest until we run out. */
- while ((head = get_vq_desc(vq, iov, &out, &in)) != vq->vring.num) {
- if (in)
- errx(1, "Input buffers in output queue?");
- len = writev(STDOUT_FILENO, iov, out);
- add_used_and_trigger(fd, vq, head, len);
- }
-}
-
-/* This is called when we no longer want to hear about Guest changes to a
- * virtqueue. This is more efficient in high-traffic cases, but it means we
- * have to set a timer to check if any more changes have occurred. */
-static void block_vq(struct virtqueue *vq)
-{
- struct itimerval itm;
+ /* We usually wait in here, for the Guest to give us something. */
+ head = wait_for_vq_desc(vq, iov, &out, &in);
+ if (in)
+ errx(1, "Input buffers in console output queue?");
- vq->vring.used->flags |= VRING_USED_F_NO_NOTIFY;
- vq->blocked = true;
-
- itm.it_interval.tv_sec = 0;
- itm.it_interval.tv_usec = 0;
- itm.it_value.tv_sec = 0;
- itm.it_value.tv_usec = timeout_usec;
+ /* writev can return a partial write, so we loop here. */
+ while (!iov_empty(iov, out)) {
+ int len = writev(STDOUT_FILENO, iov, out);
+ if (len <= 0)
+ err(1, "Write to stdout gave %i", len);
+ iov_consume(iov, out, len);
+ }
- setitimer(ITIMER_REAL, &itm, NULL);
+ /*
+ * We're finished with that buffer: if we're going to sleep,
+ * wait_for_vq_desc() will prod the Guest with an interrupt.
+ */
+ add_used(vq, head, 0);
}
/*
* The Network
*
* Handling output for network is also simple: we get all the output buffers
- * and write them (ignoring the first element) to this device's file descriptor
- * (/dev/net/tun).
+ * and write them to /dev/net/tun.
*/
-static void handle_net_output(int fd, struct virtqueue *vq, bool timeout)
+struct net_info {
+ int tunfd;
+};
+
+static void net_output(struct virtqueue *vq)
{
- unsigned int head, out, in, num = 0;
- int len;
+ struct net_info *net_info = vq->dev->priv;
+ unsigned int head, out, in;
struct iovec iov[vq->vring.num];
- static int last_timeout_num;
-
- /* Keep getting output buffers from the Guest until we run out. */
- while ((head = get_vq_desc(vq, iov, &out, &in)) != vq->vring.num) {
- if (in)
- errx(1, "Input buffers in output queue?");
- len = writev(vq->dev->fd, iov, out);
- if (len < 0)
- err(1, "Writing network packet to tun");
- add_used_and_trigger(fd, vq, head, len);
- num++;
- }
- /* Block further kicks and set up a timer if we saw anything. */
- if (!timeout && num)
- block_vq(vq);
-
- /* We never quite know how long should we wait before we check the
- * queue again for more packets. We start at 500 microseconds, and if
- * we get fewer packets than last time, we assume we made the timeout
- * too small and increase it by 10 microseconds. Otherwise, we drop it
- * by one microsecond every time. It seems to work well enough. */
- if (timeout) {
- if (num < last_timeout_num)
- timeout_usec += 10;
- else if (timeout_usec > 1)
- timeout_usec--;
- last_timeout_num = num;
- }
+ /* We usually wait in here for the Guest to give us a packet. */
+ head = wait_for_vq_desc(vq, iov, &out, &in);
+ if (in)
+ errx(1, "Input buffers in net output queue?");
+ /*
+ * Send the whole thing through to /dev/net/tun. It expects the exact
+ * same format: what a coincidence!
+ */
+ if (writev(net_info->tunfd, iov, out) < 0)
+ errx(1, "Write to tun failed?");
+
+ /*
+ * Done with that one; wait_for_vq_desc() will send the interrupt if
+ * all packets are processed.
+ */
+ add_used(vq, head, 0);
}
-/* This is where we handle a packet coming in from the tun device to our
- * Guest. */
-static bool handle_tun_input(int fd, struct device *dev)
+/*
+ * Handling network input is a bit trickier, because I've tried to optimize it.
+ *
+ * First we have a helper routine which tells is if from this file descriptor
+ * (ie. the /dev/net/tun device) will block:
+ */
+static bool will_block(int fd)
+{
+ fd_set fdset;
+ struct timeval zero = { 0, 0 };
+ FD_ZERO(&fdset);
+ FD_SET(fd, &fdset);
+ return select(fd+1, &fdset, NULL, NULL, &zero) != 1;
+}
+
+/*
+ * This handles packets coming in from the tun device to our Guest. Like all
+ * service routines, it gets called again as soon as it returns, so you don't
+ * see a while(1) loop here.
+ */
+static void net_input(struct virtqueue *vq)
{
- unsigned int head, in_num, out_num;
int len;
- struct iovec iov[dev->vq->vring.num];
-
- /* First we need a network buffer from the Guests's recv virtqueue. */
- head = get_vq_desc(dev->vq, iov, &out_num, &in_num);
- if (head == dev->vq->vring.num) {
- /* Now, it's expected that if we try to send a packet too
- * early, the Guest won't be ready yet. Wait until the device
- * status says it's ready. */
- /* FIXME: Actually want DRIVER_ACTIVE here. */
-
- /* Now tell it we want to know if new things appear. */
- dev->vq->vring.used->flags &= ~VRING_USED_F_NO_NOTIFY;
- wmb();
-
- /* We'll turn this back on if input buffers are registered. */
- return false;
- } else if (out_num)
- errx(1, "Output buffers in network recv queue?");
-
- /* Read the packet from the device directly into the Guest's buffer. */
- len = readv(dev->fd, iov, in_num);
+ unsigned int head, out, in;
+ struct iovec iov[vq->vring.num];
+ struct net_info *net_info = vq->dev->priv;
+
+ /*
+ * Get a descriptor to write an incoming packet into. This will also
+ * send an interrupt if they're out of descriptors.
+ */
+ head = wait_for_vq_desc(vq, iov, &out, &in);
+ if (out)
+ errx(1, "Output buffers in net input queue?");
+
+ /*
+ * If it looks like we'll block reading from the tun device, send them
+ * an interrupt.
+ */
+ if (vq->pending_used && will_block(net_info->tunfd))
+ trigger_irq(vq);
+
+ /*
+ * Read in the packet. This is where we normally wait (when there's no
+ * incoming network traffic).
+ */
+ len = readv(net_info->tunfd, iov, in);
if (len <= 0)
- err(1, "reading network");
+ err(1, "Failed to read from tun.");
- /* Tell the Guest about the new packet. */
- add_used_and_trigger(fd, dev->vq, head, len);
+ /*
+ * Mark that packet buffer as used, but don't interrupt here. We want
+ * to wait until we've done as much work as we can.
+ */
+ add_used(vq, head, len);
+}
+/*:*/
- verbose("tun input packet len %i [%02x %02x] (%s)\n", len,
- ((u8 *)iov[1].iov_base)[0], ((u8 *)iov[1].iov_base)[1],
- head != dev->vq->vring.num ? "sent" : "discarded");
+/* This is the helper to create threads: run the service routine in a loop. */
+static int do_thread(void *_vq)
+{
+ struct virtqueue *vq = _vq;
- /* All good. */
- return true;
+ for (;;)
+ vq->service(vq);
+ return 0;
}
-/*L:215 This is the callback attached to the network and console input
- * virtqueues: it ensures we try again, in case we stopped console or net
- * delivery because Guest didn't have any buffers. */
-static void enable_fd(int fd, struct virtqueue *vq, bool timeout)
+/*
+ * When a child dies, we kill our entire process group with SIGTERM. This
+ * also has the side effect that the shell restores the console for us!
+ */
+static void kill_launcher(int signal)
{
- add_device_fd(vq->dev->fd);
- /* Snap the Waker out of its select loop. */
- write(waker_fds.pipe[1], "", 1);
+ kill(0, SIGTERM);
}
-static void net_enable_fd(int fd, struct virtqueue *vq, bool timeout)
+static void reset_device(struct device *dev)
{
- /* We don't need to know again when Guest refills receive buffer. */
- vq->vring.used->flags |= VRING_USED_F_NO_NOTIFY;
- enable_fd(fd, vq, timeout);
+ struct virtqueue *vq;
+
+ verbose("Resetting device %s\n", dev->name);
+
+ /* Clear any features they've acked. */
+ memset(get_feature_bits(dev) + dev->feature_len, 0, dev->feature_len);
+
+ /* We're going to be explicitly killing threads, so ignore them. */
+ signal(SIGCHLD, SIG_IGN);
+
+ /* Zero out the virtqueues, get rid of their threads */
+ for (vq = dev->vq; vq; vq = vq->next) {
+ if (vq->thread != (pid_t)-1) {
+ kill(vq->thread, SIGTERM);
+ waitpid(vq->thread, NULL, 0);
+ vq->thread = (pid_t)-1;
+ }
+ memset(vq->vring.desc, 0,
+ vring_size(vq->config.num, LGUEST_VRING_ALIGN));
+ lg_last_avail(vq) = 0;
+ }
+ dev->running = false;
+
+ /* Now we care if threads die. */
+ signal(SIGCHLD, (void *)kill_launcher);
}
-/* When the Guest tells us they updated the status field, we handle it. */
-static void update_device_status(struct device *dev)
+/*L:216
+ * This actually creates the thread which services the virtqueue for a device.
+ */
+static void create_thread(struct virtqueue *vq)
+{
+ /*
+ * Create stack for thread. Since the stack grows upwards, we point
+ * the stack pointer to the end of this region.
+ */
+ char *stack = malloc(32768);
+ unsigned long args[] = { LHREQ_EVENTFD,
+ vq->config.pfn*getpagesize(), 0 };
+
+ /* Create a zero-initialized eventfd. */
+ vq->eventfd = eventfd(0, 0);
+ if (vq->eventfd < 0)
+ err(1, "Creating eventfd");
+ args[2] = vq->eventfd;
+
+ /*
+ * Attach an eventfd to this virtqueue: it will go off when the Guest
+ * does an LHCALL_NOTIFY for this vq.
+ */
+ if (write(lguest_fd, &args, sizeof(args)) != 0)
+ err(1, "Attaching eventfd");
+
+ /*
+ * CLONE_VM: because it has to access the Guest memory, and SIGCHLD so
+ * we get a signal if it dies.
+ */
+ vq->thread = clone(do_thread, stack + 32768, CLONE_VM | SIGCHLD, vq);
+ if (vq->thread == (pid_t)-1)
+ err(1, "Creating clone");
+
+ /* We close our local copy now the child has it. */
+ close(vq->eventfd);
+}
+
+static void start_device(struct device *dev)
{
+ unsigned int i;
struct virtqueue *vq;
- /* This is a reset. */
- if (dev->desc->status == 0) {
- verbose("Resetting device %s\n", dev->name);
+ verbose("Device %s OK: offered", dev->name);
+ for (i = 0; i < dev->feature_len; i++)
+ verbose(" %02x", get_feature_bits(dev)[i]);
+ verbose(", accepted");
+ for (i = 0; i < dev->feature_len; i++)
+ verbose(" %02x", get_feature_bits(dev)
+ [dev->feature_len+i]);
+
+ for (vq = dev->vq; vq; vq = vq->next) {
+ if (vq->service)
+ create_thread(vq);
+ }
+ dev->running = true;
+}
- /* Clear any features they've acked. */
- memset(get_feature_bits(dev) + dev->desc->feature_len, 0,
- dev->desc->feature_len);
+static void cleanup_devices(void)
+{
+ struct device *dev;
- /* Zero out the virtqueues. */
- for (vq = dev->vq; vq; vq = vq->next) {
- memset(vq->vring.desc, 0,
- vring_size(vq->config.num, LGUEST_VRING_ALIGN));
- lg_last_avail(vq) = 0;
- }
- } else if (dev->desc->status & VIRTIO_CONFIG_S_FAILED) {
+ for (dev = devices.dev; dev; dev = dev->next)
+ reset_device(dev);
+
+ /* If we saved off the original terminal settings, restore them now. */
+ if (orig_term.c_lflag & (ISIG|ICANON|ECHO))
+ tcsetattr(STDIN_FILENO, TCSANOW, &orig_term);
+}
+
+/* When the Guest tells us they updated the status field, we handle it. */
+static void update_device_status(struct device *dev)
+{
+ /* A zero status is a reset, otherwise it's a set of flags. */
+ if (dev->desc->status == 0)
+ reset_device(dev);
+ else if (dev->desc->status & VIRTIO_CONFIG_S_FAILED) {
warnx("Device %s configuration FAILED", dev->name);
+ if (dev->running)
+ reset_device(dev);
} else if (dev->desc->status & VIRTIO_CONFIG_S_DRIVER_OK) {
- unsigned int i;
-
- verbose("Device %s OK: offered", dev->name);
- for (i = 0; i < dev->desc->feature_len; i++)
- verbose(" %02x", get_feature_bits(dev)[i]);
- verbose(", accepted");
- for (i = 0; i < dev->desc->feature_len; i++)
- verbose(" %02x", get_feature_bits(dev)
- [dev->desc->feature_len+i]);
-
- if (dev->ready)
- dev->ready(dev);
+ if (!dev->running)
+ start_device(dev);
}
}
-/* This is the generic routine we call when the Guest uses LHCALL_NOTIFY. */
-static void handle_output(int fd, unsigned long addr)
+/*L:215
+ * This is the generic routine we call when the Guest uses LHCALL_NOTIFY. In
+ * particular, it's used to notify us of device status changes during boot.
+ */
+static void handle_output(unsigned long addr)
{
struct device *i;
- struct virtqueue *vq;
- /* Check each device and virtqueue. */
+ /* Check each device. */
for (i = devices.dev; i; i = i->next) {
- /* Notifications to device descriptors update device status. */
+ struct virtqueue *vq;
+
+ /*
+ * Notifications to device descriptors mean they updated the
+ * device status.
+ */
if (from_guest_phys(addr) == i->desc) {
update_device_status(i);
return;
}
- /* Notifications to virtqueues mean output has occurred. */
+ /*
+ * Devices *can* be used before status is set to DRIVER_OK.
+ * The original plan was that they would never do this: they
+ * would always finish setting up their status bits before
+ * actually touching the virtqueues. In practice, we allowed
+ * them to, and they do (eg. the disk probes for partition
+ * tables as part of initialization).
+ *
+ * If we see this, we start the device: once it's running, we
+ * expect the device to catch all the notifications.
+ */
for (vq = i->vq; vq; vq = vq->next) {
- if (vq->config.pfn != addr/getpagesize())
+ if (addr != vq->config.pfn*getpagesize())
continue;
-
- /* Guest should acknowledge (and set features!) before
- * using the device. */
- if (i->desc->status == 0) {
- warnx("%s gave early output", i->name);
- return;
- }
-
- if (strcmp(vq->dev->name, "console") != 0)
- verbose("Output to %s\n", vq->dev->name);
- if (vq->handle_output)
- vq->handle_output(fd, vq, false);
+ if (i->running)
+ errx(1, "Notification on running %s", i->name);
+ /* This just calls create_thread() for each virtqueue */
+ start_device(i);
return;
}
}
- /* Early console write is done using notify on a nul-terminated string
- * in Guest memory. */
+ /*
+ * Early console write is done using notify on a nul-terminated string
+ * in Guest memory. It's also great for hacking debugging messages
+ * into a Guest.
+ */
if (addr >= guest_limit)
errx(1, "Bad NOTIFY %#lx", addr);
@@ -1049,71 +1145,6 @@ static void handle_output(int fd, unsigned long addr)
strnlen(from_guest_phys(addr), guest_limit - addr));
}
-static void handle_timeout(int fd)
-{
- char buf[32];
- struct device *i;
- struct virtqueue *vq;
-
- /* Clear the pipe */
- read(timeoutpipe[0], buf, sizeof(buf));
-
- /* Check each device and virtqueue: flush blocked ones. */
- for (i = devices.dev; i; i = i->next) {
- for (vq = i->vq; vq; vq = vq->next) {
- if (!vq->blocked)
- continue;
-
- vq->vring.used->flags &= ~VRING_USED_F_NO_NOTIFY;
- vq->blocked = false;
- if (vq->handle_output)
- vq->handle_output(fd, vq, true);
- }
- }
-}
-
-/* This is called when the Waker wakes us up: check for incoming file
- * descriptors. */
-static void handle_input(int fd)
-{
- /* select() wants a zeroed timeval to mean "don't wait". */
- struct timeval poll = { .tv_sec = 0, .tv_usec = 0 };
-
- for (;;) {
- struct device *i;
- fd_set fds = devices.infds;
- int num;
-
- num = select(devices.max_infd+1, &fds, NULL, NULL, &poll);
- /* Could get interrupted */
- if (num < 0)
- continue;
- /* If nothing is ready, we're done. */
- if (num == 0)
- break;
-
- /* Otherwise, call the device(s) which have readable file
- * descriptors and a method of handling them. */
- for (i = devices.dev; i; i = i->next) {
- if (i->handle_input && FD_ISSET(i->fd, &fds)) {
- if (i->handle_input(fd, i))
- continue;
-
- /* If handle_input() returns false, it means we
- * should no longer service it. Networking and
- * console do this when there's no input
- * buffers to deliver into. Console also uses
- * it when it discovers that stdin is closed. */
- FD_CLR(i->fd, &devices.infds);
- }
- }
-
- /* Is this the timeout fd? */
- if (FD_ISSET(timeoutpipe[0], &fds))
- handle_timeout(fd);
- }
-}
-
/*L:190
* Device Setup
*
@@ -1122,20 +1153,24 @@ static void handle_input(int fd)
* routines to allocate and manage them.
*/
-/* The layout of the device page is a "struct lguest_device_desc" followed by a
+/*
+ * The layout of the device page is a "struct lguest_device_desc" followed by a
* number of virtqueue descriptors, then two sets of feature bits, then an
* array of configuration bytes. This routine returns the configuration
- * pointer. */
+ * pointer.
+ */
static u8 *device_config(const struct device *dev)
{
return (void *)(dev->desc + 1)
- + dev->desc->num_vq * sizeof(struct lguest_vqconfig)
- + dev->desc->feature_len * 2;
+ + dev->num_vq * sizeof(struct lguest_vqconfig)
+ + dev->feature_len * 2;
}
-/* This routine allocates a new "struct lguest_device_desc" from descriptor
+/*
+ * This routine allocates a new "struct lguest_device_desc" from descriptor
* table page just above the Guest's normal memory. It returns a pointer to
- * that descriptor. */
+ * that descriptor.
+ */
static struct lguest_device_desc *new_dev_desc(u16 type)
{
struct lguest_device_desc d = { .type = type };
@@ -1156,10 +1191,12 @@ static struct lguest_device_desc *new_dev_desc(u16 type)
return memcpy(p, &d, sizeof(d));
}
-/* Each device descriptor is followed by the description of its virtqueues. We
- * specify how many descriptors the virtqueue is to have. */
+/*
+ * Each device descriptor is followed by the description of its virtqueues. We
+ * specify how many descriptors the virtqueue is to have.
+ */
static void add_virtqueue(struct device *dev, unsigned int num_descs,
- void (*handle_output)(int, struct virtqueue *, bool))
+ void (*service)(struct virtqueue *))
{
unsigned int pages;
struct virtqueue **i, *vq = malloc(sizeof(*vq));
@@ -1174,8 +1211,13 @@ static void add_virtqueue(struct device *dev, unsigned int num_descs,
vq->next = NULL;
vq->last_avail_idx = 0;
vq->dev = dev;
- vq->inflight = 0;
- vq->blocked = false;
+
+ /*
+ * This is the routine the service thread will run, and its Process ID
+ * once it's running.
+ */
+ vq->service = service;
+ vq->thread = (pid_t)-1;
/* Initialize the configuration. */
vq->config.num = num_descs;
@@ -1185,33 +1227,31 @@ static void add_virtqueue(struct device *dev, unsigned int num_descs,
/* Initialize the vring. */
vring_init(&vq->vring, num_descs, p, LGUEST_VRING_ALIGN);
- /* Append virtqueue to this device's descriptor. We use
+ /*
+ * Append virtqueue to this device's descriptor. We use
* device_config() to get the end of the device's current virtqueues;
* we check that we haven't added any config or feature information
- * yet, otherwise we'd be overwriting them. */
+ * yet, otherwise we'd be overwriting them.
+ */
assert(dev->desc->config_len == 0 && dev->desc->feature_len == 0);
memcpy(device_config(dev), &vq->config, sizeof(vq->config));
+ dev->num_vq++;
dev->desc->num_vq++;
verbose("Virtqueue page %#lx\n", to_guest_phys(p));
- /* Add to tail of list, so dev->vq is first vq, dev->vq->next is
- * second. */
+ /*
+ * Add to tail of list, so dev->vq is first vq, dev->vq->next is
+ * second.
+ */
for (i = &dev->vq; *i; i = &(*i)->next);
*i = vq;
-
- /* Set the routine to call when the Guest does something to this
- * virtqueue. */
- vq->handle_output = handle_output;
-
- /* As an optimization, set the advisory "Don't Notify Me" flag if we
- * don't have a handler */
- if (!handle_output)
- vq->vring.used->flags = VRING_USED_F_NO_NOTIFY;
}
-/* The first half of the feature bitmask is for us to advertise features. The
- * second half is for the Guest to accept features. */
+/*
+ * The first half of the feature bitmask is for us to advertise features. The
+ * second half is for the Guest to accept features.
+ */
static void add_feature(struct device *dev, unsigned bit)
{
u8 *features = get_feature_bits(dev);
@@ -1219,15 +1259,17 @@ static void add_feature(struct device *dev, unsigned bit)
/* We can't extend the feature bits once we've added config bytes */
if (dev->desc->feature_len <= bit / CHAR_BIT) {
assert(dev->desc->config_len == 0);
- dev->desc->feature_len = (bit / CHAR_BIT) + 1;
+ dev->feature_len = dev->desc->feature_len = (bit/CHAR_BIT) + 1;
}
features[bit / CHAR_BIT] |= (1 << (bit % CHAR_BIT));
}
-/* This routine sets the configuration fields for an existing device's
+/*
+ * This routine sets the configuration fields for an existing device's
* descriptor. It only works for the last device, but that's OK because that's
- * how we use it. */
+ * how we use it.
+ */
static void set_config(struct device *dev, unsigned len, const void *conf)
{
/* Check we haven't overflowed our single page. */
@@ -1237,33 +1279,36 @@ static void set_config(struct device *dev, unsigned len, const void *conf)
/* Copy in the config information, and store the length. */
memcpy(device_config(dev), conf, len);
dev->desc->config_len = len;
+
+ /* Size must fit in config_len field (8 bits)! */
+ assert(dev->desc->config_len == len);
}
-/* This routine does all the creation and setup of a new device, including
- * calling new_dev_desc() to allocate the descriptor and device memory.
+/*
+ * This routine does all the creation and setup of a new device, including
+ * calling new_dev_desc() to allocate the descriptor and device memory. We
+ * don't actually start the service threads until later.
*
- * See what I mean about userspace being boring? */
-static struct device *new_device(const char *name, u16 type, int fd,
- bool (*handle_input)(int, struct device *))
+ * See what I mean about userspace being boring?
+ */
+static struct device *new_device(const char *name, u16 type)
{
struct device *dev = malloc(sizeof(*dev));
/* Now we populate the fields one at a time. */
- dev->fd = fd;
- /* If we have an input handler for this file descriptor, then we add it
- * to the device_list's fdset and maxfd. */
- if (handle_input)
- add_device_fd(dev->fd);
dev->desc = new_dev_desc(type);
- dev->handle_input = handle_input;
dev->name = name;
dev->vq = NULL;
- dev->ready = NULL;
+ dev->feature_len = 0;
+ dev->num_vq = 0;
+ dev->running = false;
- /* Append to device list. Prepending to a single-linked list is
+ /*
+ * Append to device list. Prepending to a single-linked list is
* easier, but the user expects the devices to be arranged on the bus
* in command-line order. The first network device on the command line
- * is eth0, the first block device /dev/vda, etc. */
+ * is eth0, the first block device /dev/vda, etc.
+ */
if (devices.lastdev)
devices.lastdev->next = dev;
else
@@ -1273,8 +1318,10 @@ static struct device *new_device(const char *name, u16 type, int fd,
return dev;
}
-/* Our first setup routine is the console. It's a fairly simple device, but
- * UNIX tty handling makes it uglier than it could be. */
+/*
+ * Our first setup routine is the console. It's a fairly simple device, but
+ * UNIX tty handling makes it uglier than it could be.
+ */
static void setup_console(void)
{
struct device *dev;
@@ -1282,51 +1329,35 @@ static void setup_console(void)
/* If we can save the initial standard input settings... */
if (tcgetattr(STDIN_FILENO, &orig_term) == 0) {
struct termios term = orig_term;
- /* Then we turn off echo, line buffering and ^C etc. We want a
- * raw input stream to the Guest. */
+ /*
+ * Then we turn off echo, line buffering and ^C etc: We want a
+ * raw input stream to the Guest.
+ */
term.c_lflag &= ~(ISIG|ICANON|ECHO);
tcsetattr(STDIN_FILENO, TCSANOW, &term);
- /* If we exit gracefully, the original settings will be
- * restored so the user can see what they're typing. */
- atexit(restore_term);
}
- dev = new_device("console", VIRTIO_ID_CONSOLE,
- STDIN_FILENO, handle_console_input);
+ dev = new_device("console", VIRTIO_ID_CONSOLE);
+
/* We store the console state in dev->priv, and initialize it. */
dev->priv = malloc(sizeof(struct console_abort));
((struct console_abort *)dev->priv)->count = 0;
- /* The console needs two virtqueues: the input then the output. When
+ /*
+ * The console needs two virtqueues: the input then the output. When
* they put something the input queue, we make sure we're listening to
* stdin. When they put something in the output queue, we write it to
- * stdout. */
- add_virtqueue(dev, VIRTQUEUE_NUM, enable_fd);
- add_virtqueue(dev, VIRTQUEUE_NUM, handle_console_output);
+ * stdout.
+ */
+ add_virtqueue(dev, VIRTQUEUE_NUM, console_input);
+ add_virtqueue(dev, VIRTQUEUE_NUM, console_output);
- verbose("device %u: console\n", devices.device_num++);
+ verbose("device %u: console\n", ++devices.device_num);
}
/*:*/
-static void timeout_alarm(int sig)
-{
- write(timeoutpipe[1], "", 1);
-}
-
-static void setup_timeout(void)
-{
- if (pipe(timeoutpipe) != 0)
- err(1, "Creating timeout pipe");
-
- if (fcntl(timeoutpipe[1], F_SETFL,
- fcntl(timeoutpipe[1], F_GETFL) | O_NONBLOCK) != 0)
- err(1, "Making timeout pipe nonblocking");
-
- add_device_fd(timeoutpipe[0]);
- signal(SIGALRM, timeout_alarm);
-}
-
-/*M:010 Inter-guest networking is an interesting area. Simplest is to have a
+/*M:010
+ * Inter-guest networking is an interesting area. Simplest is to have a
* --sharenet=<name> option which opens or creates a named pipe. This can be
* used to send packets to another guest in a 1:1 manner.
*
@@ -1340,7 +1371,8 @@ static void setup_timeout(void)
* multiple inter-guest channels behind one interface, although it would
* require some manner of hotplugging new virtio channels.
*
- * Finally, we could implement a virtio network switch in the kernel. :*/
+ * Finally, we could implement a virtio network switch in the kernel.
+:*/
static u32 str2ip(const char *ipaddr)
{
@@ -1365,11 +1397,13 @@ static void str2mac(const char *macaddr, unsigned char mac[6])
mac[5] = m[5];
}
-/* This code is "adapted" from libbridge: it attaches the Host end of the
+/*
+ * This code is "adapted" from libbridge: it attaches the Host end of the
* network device to the bridge device specified by the command line.
*
* This is yet another James Morris contribution (I'm an IP-level guy, so I
- * dislike bridging), and I just try not to break it. */
+ * dislike bridging), and I just try not to break it.
+ */
static void add_to_bridge(int fd, const char *if_name, const char *br_name)
{
int ifidx;
@@ -1389,9 +1423,11 @@ static void add_to_bridge(int fd, const char *if_name, const char *br_name)
err(1, "can't add %s to bridge %s", if_name, br_name);
}
-/* This sets up the Host end of the network device with an IP address, brings
+/*
+ * This sets up the Host end of the network device with an IP address, brings
* it up so packets will flow, the copies the MAC address into the hwaddr
- * pointer. */
+ * pointer.
+ */
static void configure_device(int fd, const char *tapif, u32 ipaddr)
{
struct ifreq ifr;
@@ -1418,10 +1454,12 @@ static int get_tun_device(char tapif[IFNAMSIZ])
/* Start with this zeroed. Messy but sure. */
memset(&ifr, 0, sizeof(ifr));
- /* We open the /dev/net/tun device and tell it we want a tap device. A
+ /*
+ * We open the /dev/net/tun device and tell it we want a tap device. A
* tap device is like a tun device, only somehow different. To tell
* the truth, I completely blundered my way through this code, but it
- * works now! */
+ * works now!
+ */
netfd = open_or_die("/dev/net/tun", O_RDWR);
ifr.ifr_flags = IFF_TAP | IFF_NO_PI | IFF_VNET_HDR;
strcpy(ifr.ifr_name, "tap%d");
@@ -1432,39 +1470,46 @@ static int get_tun_device(char tapif[IFNAMSIZ])
TUN_F_CSUM|TUN_F_TSO4|TUN_F_TSO6|TUN_F_TSO_ECN) != 0)
err(1, "Could not set features for tun device");
- /* We don't need checksums calculated for packets coming in this
- * device: trust us! */
+ /*
+ * We don't need checksums calculated for packets coming in this
+ * device: trust us!
+ */
ioctl(netfd, TUNSETNOCSUM, 1);
memcpy(tapif, ifr.ifr_name, IFNAMSIZ);
return netfd;
}
-/*L:195 Our network is a Host<->Guest network. This can either use bridging or
+/*L:195
+ * Our network is a Host<->Guest network. This can either use bridging or
* routing, but the principle is the same: it uses the "tun" device to inject
* packets into the Host as if they came in from a normal network card. We
- * just shunt packets between the Guest and the tun device. */
+ * just shunt packets between the Guest and the tun device.
+ */
static void setup_tun_net(char *arg)
{
struct device *dev;
- int netfd, ipfd;
+ struct net_info *net_info = malloc(sizeof(*net_info));
+ int ipfd;
u32 ip = INADDR_ANY;
bool bridging = false;
char tapif[IFNAMSIZ], *p;
struct virtio_net_config conf;
- netfd = get_tun_device(tapif);
+ net_info->tunfd = get_tun_device(tapif);
/* First we create a new network device. */
- dev = new_device("net", VIRTIO_ID_NET, netfd, handle_tun_input);
+ dev = new_device("net", VIRTIO_ID_NET);
+ dev->priv = net_info;
- /* Network devices need a receive and a send queue, just like
- * console. */
- add_virtqueue(dev, VIRTQUEUE_NUM, net_enable_fd);
- add_virtqueue(dev, VIRTQUEUE_NUM, handle_net_output);
+ /* Network devices need a recv and a send queue, just like console. */
+ add_virtqueue(dev, VIRTQUEUE_NUM, net_input);
+ add_virtqueue(dev, VIRTQUEUE_NUM, net_output);
- /* We need a socket to perform the magic network ioctls to bring up the
- * tap interface, connect to the bridge etc. Any socket will do! */
+ /*
+ * We need a socket to perform the magic network ioctls to bring up the
+ * tap interface, connect to the bridge etc. Any socket will do!
+ */
ipfd = socket(PF_INET, SOCK_DGRAM, IPPROTO_IP);
if (ipfd < 0)
err(1, "opening IP socket");
@@ -1502,6 +1547,8 @@ static void setup_tun_net(char *arg)
add_feature(dev, VIRTIO_NET_F_HOST_TSO4);
add_feature(dev, VIRTIO_NET_F_HOST_TSO6);
add_feature(dev, VIRTIO_NET_F_HOST_ECN);
+ /* We handle indirect ring entries */
+ add_feature(dev, VIRTIO_RING_F_INDIRECT_DESC);
set_config(dev, sizeof(conf), &conf);
/* We don't need the socket any more; setup is done. */
@@ -1516,93 +1563,100 @@ static void setup_tun_net(char *arg)
verbose("device %u: tun %s: %s\n",
devices.device_num, tapif, arg);
}
-
-/* Our block (disk) device should be really simple: the Guest asks for a block
- * number and we read or write that position in the file. Unfortunately, that
- * was amazingly slow: the Guest waits until the read is finished before
- * running anything else, even if it could have been doing useful work.
- *
- * We could use async I/O, except it's reputed to suck so hard that characters
- * actually go missing from your code when you try to use it.
- *
- * So we farm the I/O out to thread, and communicate with it via a pipe. */
+/*:*/
/* This hangs off device->priv. */
-struct vblk_info
-{
+struct vblk_info {
/* The size of the file. */
off64_t len;
/* The file descriptor for the file. */
int fd;
- /* IO thread listens on this file descriptor [0]. */
- int workpipe[2];
-
- /* IO thread writes to this file descriptor to mark it done, then
- * Launcher triggers interrupt to Guest. */
- int done_fd;
};
/*L:210
* The Disk
*
- * Remember that the block device is handled by a separate I/O thread. We head
- * straight into the core of that thread here:
+ * The disk only has one virtqueue, so it only has one thread. It is really
+ * simple: the Guest asks for a block number and we read or write that position
+ * in the file.
+ *
+ * Before we serviced each virtqueue in a separate thread, that was unacceptably
+ * slow: the Guest waits until the read is finished before running anything
+ * else, even if it could have been doing useful work.
+ *
+ * We could have used async I/O, except it's reputed to suck so hard that
+ * characters actually go missing from your code when you try to use it.
*/
-static bool service_io(struct device *dev)
+static void blk_request(struct virtqueue *vq)
{
- struct vblk_info *vblk = dev->priv;
+ struct vblk_info *vblk = vq->dev->priv;
unsigned int head, out_num, in_num, wlen;
int ret;
u8 *in;
struct virtio_blk_outhdr *out;
- struct iovec iov[dev->vq->vring.num];
+ struct iovec iov[vq->vring.num];
off64_t off;
- /* See if there's a request waiting. If not, nothing to do. */
- head = get_vq_desc(dev->vq, iov, &out_num, &in_num);
- if (head == dev->vq->vring.num)
- return false;
+ /*
+ * Get the next request, where we normally wait. It triggers the
+ * interrupt to acknowledge previously serviced requests (if any).
+ */
+ head = wait_for_vq_desc(vq, iov, &out_num, &in_num);
- /* Every block request should contain at least one output buffer
+ /*
+ * Every block request should contain at least one output buffer
* (detailing the location on disk and the type of request) and one
- * input buffer (to hold the result). */
+ * input buffer (to hold the result).
+ */
if (out_num == 0 || in_num == 0)
errx(1, "Bad virtblk cmd %u out=%u in=%u",
head, out_num, in_num);
out = convert(&iov[0], struct virtio_blk_outhdr);
in = convert(&iov[out_num+in_num-1], u8);
+ /*
+ * For historical reasons, block operations are expressed in 512 byte
+ * "sectors".
+ */
off = out->sector * 512;
- /* The block device implements "barriers", where the Guest indicates
+ /*
+ * The block device implements "barriers", where the Guest indicates
* that it wants all previous writes to occur before this write. We
* don't have a way of asking our kernel to do a barrier, so we just
- * synchronize all the data in the file. Pretty poor, no? */
+ * synchronize all the data in the file. Pretty poor, no?
+ */
if (out->type & VIRTIO_BLK_T_BARRIER)
fdatasync(vblk->fd);
- /* In general the virtio block driver is allowed to try SCSI commands.
- * It'd be nice if we supported eject, for example, but we don't. */
+ /*
+ * In general the virtio block driver is allowed to try SCSI commands.
+ * It'd be nice if we supported eject, for example, but we don't.
+ */
if (out->type & VIRTIO_BLK_T_SCSI_CMD) {
fprintf(stderr, "Scsi commands unsupported\n");
*in = VIRTIO_BLK_S_UNSUPP;
wlen = sizeof(*in);
} else if (out->type & VIRTIO_BLK_T_OUT) {
- /* Write */
-
- /* Move to the right location in the block file. This can fail
- * if they try to write past end. */
+ /*
+ * Write
+ *
+ * Move to the right location in the block file. This can fail
+ * if they try to write past end.
+ */
if (lseek64(vblk->fd, off, SEEK_SET) != off)
err(1, "Bad seek to sector %llu", out->sector);
ret = writev(vblk->fd, iov+1, out_num-1);
verbose("WRITE to sector %llu: %i\n", out->sector, ret);
- /* Grr... Now we know how long the descriptor they sent was, we
+ /*
+ * Grr... Now we know how long the descriptor they sent was, we
* make sure they didn't try to write over the end of the block
- * file (possibly extending it). */
+ * file (possibly extending it).
+ */
if (ret > 0 && off + ret > vblk->len) {
/* Trim it back to the correct length */
ftruncate64(vblk->fd, vblk->len);
@@ -1612,10 +1666,12 @@ static bool service_io(struct device *dev)
wlen = sizeof(*in);
*in = (ret >= 0 ? VIRTIO_BLK_S_OK : VIRTIO_BLK_S_IOERR);
} else {
- /* Read */
-
- /* Move to the right location in the block file. This can fail
- * if they try to read past end. */
+ /*
+ * Read
+ *
+ * Move to the right location in the block file. This can fail
+ * if they try to read past end.
+ */
if (lseek64(vblk->fd, off, SEEK_SET) != off)
err(1, "Bad seek to sector %llu", out->sector);
@@ -1630,90 +1686,31 @@ static bool service_io(struct device *dev)
}
}
- /* OK, so we noted that it was pretty poor to use an fdatasync as a
+ /*
+ * OK, so we noted that it was pretty poor to use an fdatasync as a
* barrier. But Christoph Hellwig points out that we need a sync
* *afterwards* as well: "Barriers specify no reordering to the front
- * or the back." And Jens Axboe confirmed it, so here we are: */
+ * or the back." And Jens Axboe confirmed it, so here we are:
+ */
if (out->type & VIRTIO_BLK_T_BARRIER)
fdatasync(vblk->fd);
- /* We can't trigger an IRQ, because we're not the Launcher. It does
- * that when we tell it we're done. */
- add_used(dev->vq, head, wlen);
- return true;
-}
-
-/* This is the thread which actually services the I/O. */
-static int io_thread(void *_dev)
-{
- struct device *dev = _dev;
- struct vblk_info *vblk = dev->priv;
- char c;
-
- /* Close other side of workpipe so we get 0 read when main dies. */
- close(vblk->workpipe[1]);
- /* Close the other side of the done_fd pipe. */
- close(dev->fd);
-
- /* When this read fails, it means Launcher died, so we follow. */
- while (read(vblk->workpipe[0], &c, 1) == 1) {
- /* We acknowledge each request immediately to reduce latency,
- * rather than waiting until we've done them all. I haven't
- * measured to see if it makes any difference.
- *
- * That would be an interesting test, wouldn't it? You could
- * also try having more than one I/O thread. */
- while (service_io(dev))
- write(vblk->done_fd, &c, 1);
- }
- return 0;
-}
-
-/* Now we've seen the I/O thread, we return to the Launcher to see what happens
- * when that thread tells us it's completed some I/O. */
-static bool handle_io_finish(int fd, struct device *dev)
-{
- char c;
-
- /* If the I/O thread died, presumably it printed the error, so we
- * simply exit. */
- if (read(dev->fd, &c, 1) != 1)
- exit(1);
-
- /* It did some work, so trigger the irq. */
- trigger_irq(fd, dev->vq);
- return true;
-}
-
-/* When the Guest submits some I/O, we just need to wake the I/O thread. */
-static void handle_virtblk_output(int fd, struct virtqueue *vq, bool timeout)
-{
- struct vblk_info *vblk = vq->dev->priv;
- char c = 0;
-
- /* Wake up I/O thread and tell it to go to work! */
- if (write(vblk->workpipe[1], &c, 1) != 1)
- /* Presumably it indicated why it died. */
- exit(1);
+ /* Finished that request. */
+ add_used(vq, head, wlen);
}
/*L:198 This actually sets up a virtual block device. */
static void setup_block_file(const char *filename)
{
- int p[2];
struct device *dev;
struct vblk_info *vblk;
- void *stack;
struct virtio_blk_config conf;
- /* This is the pipe the I/O thread will use to tell us I/O is done. */
- pipe(p);
-
- /* The device responds to return from I/O thread. */
- dev = new_device("block", VIRTIO_ID_BLOCK, p[0], handle_io_finish);
+ /* Creat the device. */
+ dev = new_device("block", VIRTIO_ID_BLOCK);
/* The device has one virtqueue, where the Guest places requests. */
- add_virtqueue(dev, VIRTQUEUE_NUM, handle_virtblk_output);
+ add_virtqueue(dev, VIRTQUEUE_NUM, blk_request);
/* Allocate the room for our own bookkeeping */
vblk = dev->priv = malloc(sizeof(*vblk));
@@ -1728,64 +1725,50 @@ static void setup_block_file(const char *filename)
/* Tell Guest how many sectors this device has. */
conf.capacity = cpu_to_le64(vblk->len / 512);
- /* Tell Guest not to put in too many descriptors at once: two are used
- * for the in and out elements. */
+ /*
+ * Tell Guest not to put in too many descriptors at once: two are used
+ * for the in and out elements.
+ */
add_feature(dev, VIRTIO_BLK_F_SEG_MAX);
conf.seg_max = cpu_to_le32(VIRTQUEUE_NUM - 2);
- set_config(dev, sizeof(conf), &conf);
-
- /* The I/O thread writes to this end of the pipe when done. */
- vblk->done_fd = p[1];
-
- /* This is the second pipe, which is how we tell the I/O thread about
- * more work. */
- pipe(vblk->workpipe);
-
- /* Create stack for thread and run it. Since stack grows upwards, we
- * point the stack pointer to the end of this region. */
- stack = malloc(32768);
- /* SIGCHLD - We dont "wait" for our cloned thread, so prevent it from
- * becoming a zombie. */
- if (clone(io_thread, stack + 32768, CLONE_VM | SIGCHLD, dev) == -1)
- err(1, "Creating clone");
-
- /* We don't need to keep the I/O thread's end of the pipes open. */
- close(vblk->done_fd);
- close(vblk->workpipe[0]);
+ /* Don't try to put whole struct: we have 8 bit limit. */
+ set_config(dev, offsetof(struct virtio_blk_config, geometry), &conf);
verbose("device %u: virtblock %llu sectors\n",
- devices.device_num, le64_to_cpu(conf.capacity));
+ ++devices.device_num, le64_to_cpu(conf.capacity));
}
-/* Our random number generator device reads from /dev/random into the Guest's
+/*L:211
+ * Our random number generator device reads from /dev/random into the Guest's
* input buffers. The usual case is that the Guest doesn't want random numbers
* and so has no buffers although /dev/random is still readable, whereas
* console is the reverse.
*
- * The same logic applies, however. */
-static bool handle_rng_input(int fd, struct device *dev)
+ * The same logic applies, however.
+ */
+struct rng_info {
+ int rfd;
+};
+
+static void rng_input(struct virtqueue *vq)
{
int len;
unsigned int head, in_num, out_num, totlen = 0;
- struct iovec iov[dev->vq->vring.num];
+ struct rng_info *rng_info = vq->dev->priv;
+ struct iovec iov[vq->vring.num];
/* First we need a buffer from the Guests's virtqueue. */
- head = get_vq_desc(dev->vq, iov, &out_num, &in_num);
-
- /* If they're not ready for input, stop listening to this file
- * descriptor. We'll start again once they add an input buffer. */
- if (head == dev->vq->vring.num)
- return false;
-
+ head = wait_for_vq_desc(vq, iov, &out_num, &in_num);
if (out_num)
errx(1, "Output buffers in rng?");
- /* This is why we convert to iovecs: the readv() call uses them, and so
- * it reads straight into the Guest's buffer. We loop to make sure we
- * fill it. */
+ /*
+ * Just like the console write, we loop to cover the whole iovec.
+ * In this case, short reads actually happen quite a bit.
+ */
while (!iov_empty(iov, in_num)) {
- len = readv(dev->fd, iov, in_num);
+ len = readv(rng_info->rfd, iov, in_num);
if (len <= 0)
err(1, "Read from /dev/random gave %i", len);
iov_consume(iov, in_num, len);
@@ -1793,25 +1776,26 @@ static bool handle_rng_input(int fd, struct device *dev)
}
/* Tell the Guest about the new input. */
- add_used_and_trigger(fd, dev->vq, head, totlen);
-
- /* Everything went OK! */
- return true;
+ add_used(vq, head, totlen);
}
-/* And this creates a "hardware" random number device for the Guest. */
+/*L:199
+ * This creates a "hardware" random number device for the Guest.
+ */
static void setup_rng(void)
{
struct device *dev;
- int fd;
+ struct rng_info *rng_info = malloc(sizeof(*rng_info));
- fd = open_or_die("/dev/random", O_RDONLY);
+ /* Our device's privat info simply contains the /dev/random fd. */
+ rng_info->rfd = open_or_die("/dev/random", O_RDONLY);
- /* The device responds to return from I/O thread. */
- dev = new_device("rng", VIRTIO_ID_RNG, fd, handle_rng_input);
+ /* Create the new device. */
+ dev = new_device("rng", VIRTIO_ID_RNG);
+ dev->priv = rng_info;
/* The device has one virtqueue, where the Guest places inbufs. */
- add_virtqueue(dev, VIRTQUEUE_NUM, enable_fd);
+ add_virtqueue(dev, VIRTQUEUE_NUM, rng_input);
verbose("device %u: rng\n", devices.device_num++);
}
@@ -1822,22 +1806,27 @@ static void __attribute__((noreturn)) restart_guest(void)
{
unsigned int i;
- /* Since we don't track all open fds, we simply close everything beyond
- * stderr. */
+ /*
+ * Since we don't track all open fds, we simply close everything beyond
+ * stderr.
+ */
for (i = 3; i < FD_SETSIZE; i++)
close(i);
- /* The exec automatically gets rid of the I/O and Waker threads. */
+ /* Reset all the devices (kills all threads). */
+ cleanup_devices();
+
execv(main_args[0], main_args);
err(1, "Could not exec %s", main_args[0]);
}
-/*L:220 Finally we reach the core of the Launcher which runs the Guest, serves
- * its input and output, and finally, lays it to rest. */
-static void __attribute__((noreturn)) run_guest(int lguest_fd)
+/*L:220
+ * Finally we reach the core of the Launcher which runs the Guest, serves
+ * its input and output, and finally, lays it to rest.
+ */
+static void __attribute__((noreturn)) run_guest(void)
{
for (;;) {
- unsigned long args[] = { LHREQ_BREAK, 0 };
unsigned long notify_addr;
int readval;
@@ -1848,8 +1837,7 @@ static void __attribute__((noreturn)) run_guest(int lguest_fd)
/* One unsigned long means the Guest did HCALL_NOTIFY */
if (readval == sizeof(notify_addr)) {
verbose("Notify on address %#lx\n", notify_addr);
- handle_output(lguest_fd, notify_addr);
- continue;
+ handle_output(notify_addr);
/* ENOENT means the Guest died. Reading tells us why. */
} else if (errno == ENOENT) {
char reason[1024] = { 0 };
@@ -1858,19 +1846,9 @@ static void __attribute__((noreturn)) run_guest(int lguest_fd)
/* ERESTART means that we need to reboot the guest */
} else if (errno == ERESTART) {
restart_guest();
- /* EAGAIN means a signal (timeout).
- * Anything else means a bug or incompatible change. */
- } else if (errno != EAGAIN)
+ /* Anything else means a bug or incompatible change. */
+ } else
err(1, "Running guest failed");
-
- /* Only service input on thread for CPU 0. */
- if (cpu_id != 0)
- continue;
-
- /* Service input, then unset the BREAK to release the Waker. */
- handle_input(lguest_fd);
- if (pwrite(lguest_fd, args, sizeof(args), cpu_id) < 0)
- err(1, "Resetting break");
}
}
/*L:240
@@ -1880,7 +1858,7 @@ static void __attribute__((noreturn)) run_guest(int lguest_fd)
*
* Are you ready? Take a deep breath and join me in the core of the Host, in
* "make Host".
- :*/
+:*/
static struct option opts[] = {
{ "verbose", 0, NULL, 'v' },
@@ -1901,11 +1879,10 @@ static void usage(void)
/*L:105 The main routine is where the real work begins: */
int main(int argc, char *argv[])
{
- /* Memory, top-level pagetable, code startpoint and size of the
- * (optional) initrd. */
+ /* Memory, code startpoint and size of the (optional) initrd. */
unsigned long mem = 0, start, initrd_size = 0;
- /* Two temporaries and the /dev/lguest file descriptor. */
- int i, c, lguest_fd;
+ /* Two temporaries. */
+ int i, c;
/* The boot information for the Guest. */
struct boot_params *boot;
/* If they specify an initrd file to load. */
@@ -1913,33 +1890,33 @@ int main(int argc, char *argv[])
/* Save the args: we "reboot" by execing ourselves again. */
main_args = argv;
- /* We don't "wait" for the children, so prevent them from becoming
- * zombies. */
- signal(SIGCHLD, SIG_IGN);
- /* First we initialize the device list. Since console and network
- * device receive input from a file descriptor, we keep an fdset
- * (infds) and the maximum fd number (max_infd) with the head of the
- * list. We also keep a pointer to the last device. Finally, we keep
- * the next interrupt number to use for devices (1: remember that 0 is
- * used by the timer). */
- FD_ZERO(&devices.infds);
- devices.max_infd = -1;
+ /*
+ * First we initialize the device list. We keep a pointer to the last
+ * device, and the next interrupt number to use for devices (1:
+ * remember that 0 is used by the timer).
+ */
devices.lastdev = NULL;
devices.next_irq = 1;
+ /* We're CPU 0. In fact, that's the only CPU possible right now. */
cpu_id = 0;
- /* We need to know how much memory so we can set up the device
+
+ /*
+ * We need to know how much memory so we can set up the device
* descriptor and memory pages for the devices as we parse the command
* line. So we quickly look through the arguments to find the amount
- * of memory now. */
+ * of memory now.
+ */
for (i = 1; i < argc; i++) {
if (argv[i][0] != '-') {
mem = atoi(argv[i]) * 1024 * 1024;
- /* We start by mapping anonymous pages over all of
+ /*
+ * We start by mapping anonymous pages over all of
* guest-physical memory range. This fills it with 0,
* and ensures that the Guest won't be killed when it
- * tries to access it. */
+ * tries to access it.
+ */
guest_base = map_zeroed_pages(mem / getpagesize()
+ DEVICE_PAGES);
guest_limit = mem;
@@ -1972,8 +1949,10 @@ int main(int argc, char *argv[])
usage();
}
}
- /* After the other arguments we expect memory and kernel image name,
- * followed by command line arguments for the kernel. */
+ /*
+ * After the other arguments we expect memory and kernel image name,
+ * followed by command line arguments for the kernel.
+ */
if (optind + 2 > argc)
usage();
@@ -1982,9 +1961,6 @@ int main(int argc, char *argv[])
/* We always have a console device */
setup_console();
- /* We can timeout waiting for Guest network transmit. */
- setup_timeout();
-
/* Now we load the kernel */
start = load_kernel(open_or_die(argv[optind+1], O_RDONLY));
@@ -1994,20 +1970,26 @@ int main(int argc, char *argv[])
/* Map the initrd image if requested (at top of physical memory) */
if (initrd_name) {
initrd_size = load_initrd(initrd_name, mem);
- /* These are the location in the Linux boot header where the
- * start and size of the initrd are expected to be found. */
+ /*
+ * These are the location in the Linux boot header where the
+ * start and size of the initrd are expected to be found.
+ */
boot->hdr.ramdisk_image = mem - initrd_size;
boot->hdr.ramdisk_size = initrd_size;
/* The bootloader type 0xFF means "unknown"; that's OK. */
boot->hdr.type_of_loader = 0xFF;
}
- /* The Linux boot header contains an "E820" memory map: ours is a
- * simple, single region. */
+ /*
+ * The Linux boot header contains an "E820" memory map: ours is a
+ * simple, single region.
+ */
boot->e820_entries = 1;
boot->e820_map[0] = ((struct e820entry) { 0, mem, E820_RAM });
- /* The boot header contains a command line pointer: we put the command
- * line after the boot header. */
+ /*
+ * The boot header contains a command line pointer: we put the command
+ * line after the boot header.
+ */
boot->hdr.cmd_line_ptr = to_guest_phys(boot + 1);
/* We use a simple helper to copy the arguments separated by spaces. */
concat((char *)(boot + 1), argv+optind+2);
@@ -2021,17 +2003,20 @@ int main(int argc, char *argv[])
/* Tell the entry path not to try to reload segment registers. */
boot->hdr.loadflags |= KEEP_SEGMENTS;
- /* We tell the kernel to initialize the Guest: this returns the open
- * /dev/lguest file descriptor. */
- lguest_fd = tell_kernel(start);
+ /*
+ * We tell the kernel to initialize the Guest: this returns the open
+ * /dev/lguest file descriptor.
+ */
+ tell_kernel(start);
+
+ /* Ensure that we terminate if a device-servicing child dies. */
+ signal(SIGCHLD, kill_launcher);
- /* We clone off a thread, which wakes the Launcher whenever one of the
- * input file descriptors needs attention. We call this the Waker, and
- * we'll cover it in a moment. */
- setup_waker(lguest_fd);
+ /* If we exit via err(), this kills all the threads, restores tty. */
+ atexit(cleanup_devices);
/* Finally, run the Guest. This doesn't return. */
- run_guest(lguest_fd);
+ run_guest();
}
/*:*/
diff --git a/Documentation/lguest/lguest.txt b/Documentation/lguest/lguest.txt
index 28c747362f9..efb3a6a045a 100644
--- a/Documentation/lguest/lguest.txt
+++ b/Documentation/lguest/lguest.txt
@@ -37,7 +37,6 @@ Running Lguest:
"Paravirtualized guest support" = Y
"Lguest guest support" = Y
"High Memory Support" = off/4GB
- "PAE (Physical Address Extension) Support" = N
"Alignment value to which kernel should be aligned" = 0x100000
(CONFIG_PARAVIRT=y, CONFIG_LGUEST_GUEST=y, CONFIG_HIGHMEM64G=n and
CONFIG_PHYSICAL_ALIGN=0x100000)
diff --git a/Documentation/local_ops.txt b/Documentation/local_ops.txt
index 23045b8b50f..300da4bdfdb 100644
--- a/Documentation/local_ops.txt
+++ b/Documentation/local_ops.txt
@@ -34,7 +34,7 @@ out of order wrt other memory writes by the owner CPU.
It can be done by slightly modifying the standard atomic operations : only
their UP variant must be kept. It typically means removing LOCK prefix (on
-i386 and x86_64) and any SMP sychronization barrier. If the architecture does
+i386 and x86_64) and any SMP synchronization barrier. If the architecture does
not have a different behavior between SMP and UP, including asm-generic/local.h
in your architecture's local.h is sufficient.
diff --git a/Documentation/lockdep-design.txt b/Documentation/lockdep-design.txt
index e20d913d591..abf768c681e 100644
--- a/Documentation/lockdep-design.txt
+++ b/Documentation/lockdep-design.txt
@@ -30,9 +30,9 @@ State
The validator tracks lock-class usage history into 4n + 1 separate state bits:
- 'ever held in STATE context'
-- 'ever head as readlock in STATE context'
-- 'ever head with STATE enabled'
-- 'ever head as readlock with STATE enabled'
+- 'ever held as readlock in STATE context'
+- 'ever held with STATE enabled'
+- 'ever held as readlock with STATE enabled'
Where STATE can be either one of (kernel/lockdep_states.h)
- hardirq
diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
index f5b7127f54a..7f5809eddee 100644
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -31,6 +31,7 @@ Contents:
- Locking functions.
- Interrupt disabling functions.
+ - Sleep and wake-up functions.
- Miscellaneous functions.
(*) Inter-CPU locking barrier effects.
@@ -1217,6 +1218,132 @@ barriers are required in such a situation, they must be provided from some
other means.
+SLEEP AND WAKE-UP FUNCTIONS
+---------------------------
+
+Sleeping and waking on an event flagged in global data can be viewed as an
+interaction between two pieces of data: the task state of the task waiting for
+the event and the global data used to indicate the event. To make sure that
+these appear to happen in the right order, the primitives to begin the process
+of going to sleep, and the primitives to initiate a wake up imply certain
+barriers.
+
+Firstly, the sleeper normally follows something like this sequence of events:
+
+ for (;;) {
+ set_current_state(TASK_UNINTERRUPTIBLE);
+ if (event_indicated)
+ break;
+ schedule();
+ }
+
+A general memory barrier is interpolated automatically by set_current_state()
+after it has altered the task state:
+
+ CPU 1
+ ===============================
+ set_current_state();
+ set_mb();
+ STORE current->state
+ <general barrier>
+ LOAD event_indicated
+
+set_current_state() may be wrapped by:
+
+ prepare_to_wait();
+ prepare_to_wait_exclusive();
+
+which therefore also imply a general memory barrier after setting the state.
+The whole sequence above is available in various canned forms, all of which
+interpolate the memory barrier in the right place:
+
+ wait_event();
+ wait_event_interruptible();
+ wait_event_interruptible_exclusive();
+ wait_event_interruptible_timeout();
+ wait_event_killable();
+ wait_event_timeout();
+ wait_on_bit();
+ wait_on_bit_lock();
+
+
+Secondly, code that performs a wake up normally follows something like this:
+
+ event_indicated = 1;
+ wake_up(&event_wait_queue);
+
+or:
+
+ event_indicated = 1;
+ wake_up_process(event_daemon);
+
+A write memory barrier is implied by wake_up() and co. if and only if they wake
+something up. The barrier occurs before the task state is cleared, and so sits
+between the STORE to indicate the event and the STORE to set TASK_RUNNING:
+
+ CPU 1 CPU 2
+ =============================== ===============================
+ set_current_state(); STORE event_indicated
+ set_mb(); wake_up();
+ STORE current->state <write barrier>
+ <general barrier> STORE current->state
+ LOAD event_indicated
+
+The available waker functions include:
+
+ complete();
+ wake_up();
+ wake_up_all();
+ wake_up_bit();
+ wake_up_interruptible();
+ wake_up_interruptible_all();
+ wake_up_interruptible_nr();
+ wake_up_interruptible_poll();
+ wake_up_interruptible_sync();
+ wake_up_interruptible_sync_poll();
+ wake_up_locked();
+ wake_up_locked_poll();
+ wake_up_nr();
+ wake_up_poll();
+ wake_up_process();
+
+
+[!] Note that the memory barriers implied by the sleeper and the waker do _not_
+order multiple stores before the wake-up with respect to loads of those stored
+values after the sleeper has called set_current_state(). For instance, if the
+sleeper does:
+
+ set_current_state(TASK_INTERRUPTIBLE);
+ if (event_indicated)
+ break;
+ __set_current_state(TASK_RUNNING);
+ do_something(my_data);
+
+and the waker does:
+
+ my_data = value;
+ event_indicated = 1;
+ wake_up(&event_wait_queue);
+
+there's no guarantee that the change to event_indicated will be perceived by
+the sleeper as coming after the change to my_data. In such a circumstance, the
+code on both sides must interpolate its own memory barriers between the
+separate data accesses. Thus the above sleeper ought to do:
+
+ set_current_state(TASK_INTERRUPTIBLE);
+ if (event_indicated) {
+ smp_rmb();
+ do_something(my_data);
+ }
+
+and the waker should do:
+
+ my_data = value;
+ smp_wmb();
+ event_indicated = 1;
+ wake_up(&event_wait_queue);
+
+
MISCELLANEOUS FUNCTIONS
-----------------------
@@ -1366,7 +1493,7 @@ WHERE ARE MEMORY BARRIERS NEEDED?
Under normal operation, memory operation reordering is generally not going to
be a problem as a single-threaded linear piece of code will still appear to
-work correctly, even if it's in an SMP kernel. There are, however, three
+work correctly, even if it's in an SMP kernel. There are, however, four
circumstances in which reordering definitely _could_ be a problem:
(*) Interprocessor interaction.
diff --git a/Documentation/memory-hotplug.txt b/Documentation/memory-hotplug.txt
index 4c2ecf537a4..bbc8a6a3692 100644
--- a/Documentation/memory-hotplug.txt
+++ b/Documentation/memory-hotplug.txt
@@ -73,13 +73,13 @@ this phase is triggered automatically. ACPI can notify this event. If not,
(see Section 4.).
Logical Memory Hotplug phase is to change memory state into
-avaiable/unavailable for users. Amount of memory from user's view is
+available/unavailable for users. Amount of memory from user's view is
changed by this phase. The kernel makes all memory in it as free pages
when a memory range is available.
In this document, this phase is described as online/offline.
-Logical Memory Hotplug phase is triggred by write of sysfs file by system
+Logical Memory Hotplug phase is triggered by write of sysfs file by system
administrator. For the hot-add case, it must be executed after Physical Hotplug
phase by hand.
(However, if you writes udev's hotplug scripts for memory hotplug, these
@@ -334,7 +334,7 @@ MEMORY_CANCEL_ONLINE
Generated if MEMORY_GOING_ONLINE fails.
MEMORY_ONLINE
- Generated when memory has succesfully brought online. The callback may
+ Generated when memory has successfully brought online. The callback may
allocate pages from the new memory.
MEMORY_GOING_OFFLINE
@@ -359,7 +359,7 @@ The third argument is passed by pointer of struct memory_notify.
struct memory_notify {
unsigned long start_pfn;
unsigned long nr_pages;
- int status_cahnge_nid;
+ int status_change_nid;
}
start_pfn is start_pfn of online/offline memory.
diff --git a/Documentation/mn10300/ABI.txt b/Documentation/mn10300/ABI.txt
index 1fef1f06dfd..d3507bad428 100644
--- a/Documentation/mn10300/ABI.txt
+++ b/Documentation/mn10300/ABI.txt
@@ -26,7 +26,7 @@ registers and the stack. If the first argument is a 64-bit value, it will be
passed in D0:D1. If the first argument is not a 64-bit value, but the second
is, the second will be passed entirely on the stack and D1 will be unused.
-Arguments smaller than 32-bits are not coelesced within a register or a stack
+Arguments smaller than 32-bits are not coalesced within a register or a stack
word. For example, two byte-sized arguments will always be passed in separate
registers or word-sized stack slots.
diff --git a/Documentation/mtd/nand_ecc.txt b/Documentation/mtd/nand_ecc.txt
index bdf93b7f0f2..274821b35a7 100644
--- a/Documentation/mtd/nand_ecc.txt
+++ b/Documentation/mtd/nand_ecc.txt
@@ -50,7 +50,7 @@ byte 255: bit7 bit6 bit5 bit4 bit3 bit2 bit1 bit0 rp1 rp3 rp5 ... rp15
cp5 cp5 cp5 cp5 cp4 cp4 cp4 cp4
This figure represents a sector of 256 bytes.
-cp is my abbreviaton for column parity, rp for row parity.
+cp is my abbreviation for column parity, rp for row parity.
Let's start to explain column parity.
cp0 is the parity that belongs to all bit0, bit2, bit4, bit6.
@@ -560,7 +560,7 @@ Measuring this code again showed big gain. When executing the original
linux code 1 million times, this took about 1 second on my system.
(using time to measure the performance). After this iteration I was back
to 0.075 sec. Actually I had to decide to start measuring over 10
-million interations in order not to loose too much accuracy. This one
+million iterations in order not to lose too much accuracy. This one
definitely seemed to be the jackpot!
There is a little bit more room for improvement though. There are three
@@ -571,8 +571,8 @@ loop; This eliminates 3 statements per loop. Of course after the loop we
need to correct by adding:
rp4 ^= rp4_6;
rp6 ^= rp4_6
-Furthermore there are 4 sequential assingments to rp8. This can be
-encoded slightly more efficient by saving tmppar before those 4 lines
+Furthermore there are 4 sequential assignments to rp8. This can be
+encoded slightly more efficiently by saving tmppar before those 4 lines
and later do rp8 = rp8 ^ tmppar ^ notrp8;
(where notrp8 is the value of rp8 before those 4 lines).
Again a use of the commutative property of xor.
@@ -622,7 +622,7 @@ Not a big change, but every penny counts :-)
Analysis 7
==========
-Acutally this made things worse. Not very much, but I don't want to move
+Actually this made things worse. Not very much, but I don't want to move
into the wrong direction. Maybe something to investigate later. Could
have to do with caching again.
@@ -642,7 +642,7 @@ Analysis 8
This makes things worse. Let's stick with attempt 6 and continue from there.
Although it seems that the code within the loop cannot be optimised
further there is still room to optimize the generation of the ecc codes.
-We can simply calcualate the total parity. If this is 0 then rp4 = rp5
+We can simply calculate the total parity. If this is 0 then rp4 = rp5
etc. If the parity is 1, then rp4 = !rp5;
But if rp4 = rp5 we do not need rp5 etc. We can just write the even bits
in the result byte and then do something like
diff --git a/Documentation/networking/6pack.txt b/Documentation/networking/6pack.txt
index d0777a1200e..8f339428fdf 100644
--- a/Documentation/networking/6pack.txt
+++ b/Documentation/networking/6pack.txt
@@ -1,7 +1,7 @@
This is the 6pack-mini-HOWTO, written by
Andreas Könsgen DG3KQ
-Internet: ajk@iehk.rwth-aachen.de
+Internet: ajk@comnets.uni-bremen.de
AMPR-net: dg3kq@db0pra.ampr.org
AX.25: dg3kq@db0ach.#nrw.deu.eu
diff --git a/Documentation/networking/bonding.txt b/Documentation/networking/bonding.txt
index 08762750f12..d5181ce9ff6 100644
--- a/Documentation/networking/bonding.txt
+++ b/Documentation/networking/bonding.txt
@@ -221,7 +221,7 @@ ad_select
- Any slave's 802.3ad association state changes
- - The bond's adminstrative state changes to up
+ - The bond's administrative state changes to up
count or 2
@@ -369,7 +369,7 @@ fail_over_mac
When this policy is used in conjuction with the mii
monitor, devices which assert link up prior to being
able to actually transmit and receive are particularly
- susecptible to loss of the gratuitous ARP, and an
+ susceptible to loss of the gratuitous ARP, and an
appropriate updelay setting may be required.
follow or 2
@@ -1794,7 +1794,7 @@ target to query.
generally referred to as "trunk failover." This is a feature of the
switch that causes the link state of a particular switch port to be set
down (or up) when the state of another switch port goes down (or up).
-It's purpose is to propogate link failures from logically "exterior" ports
+Its purpose is to propagate link failures from logically "exterior" ports
to the logically "interior" ports that bonding is able to monitor via
miimon. Availability and configuration for trunk failover varies by
switch, but this can be a viable alternative to the ARP monitor when using
diff --git a/Documentation/networking/can.txt b/Documentation/networking/can.txt
index 2035bc4932f..cd79735013f 100644
--- a/Documentation/networking/can.txt
+++ b/Documentation/networking/can.txt
@@ -36,10 +36,15 @@ This file contains
6.2 local loopback of sent frames
6.3 CAN controller hardware filters
6.4 The virtual CAN driver (vcan)
- 6.5 currently supported CAN hardware
- 6.6 todo
+ 6.5 The CAN network device driver interface
+ 6.5.1 Netlink interface to set/get devices properties
+ 6.5.2 Setting the CAN bit-timing
+ 6.5.3 Starting and stopping the CAN network device
+ 6.6 supported CAN hardware
- 7 Credits
+ 7 Socket CAN resources
+
+ 8 Credits
============================================================================
@@ -234,6 +239,8 @@ solution for a couple of reasons:
the user application using the common CAN filter mechanisms. Inside
this filter definition the (interested) type of errors may be
selected. The reception of error frames is disabled by default.
+ The format of the CAN error frame is briefly decribed in the Linux
+ header file "include/linux/can/error.h".
4. How to use Socket CAN
------------------------
@@ -327,7 +334,7 @@ solution for a couple of reasons:
return 1;
}
- /* paraniod check ... */
+ /* paranoid check ... */
if (nbytes < sizeof(struct can_frame)) {
fprintf(stderr, "read: incomplete CAN frame\n");
return 1;
@@ -605,61 +612,213 @@ solution for a couple of reasons:
removal of vcan network devices can be managed with the ip(8) tool:
- Create a virtual CAN network interface:
- ip link add type vcan
+ $ ip link add type vcan
- Create a virtual CAN network interface with a specific name 'vcan42':
- ip link add dev vcan42 type vcan
+ $ ip link add dev vcan42 type vcan
- Remove a (virtual CAN) network interface 'vcan42':
- ip link del vcan42
-
- The tool 'vcan' from the SocketCAN SVN repository on BerliOS is obsolete.
-
- Virtual CAN network device creation in older Kernels:
- In Linux Kernel versions < 2.6.24 the vcan driver creates 4 vcan
- netdevices at module load time by default. This value can be changed
- with the module parameter 'numdev'. E.g. 'modprobe vcan numdev=8'
-
- 6.5 currently supported CAN hardware
+ $ ip link del vcan42
+
+ 6.5 The CAN network device driver interface
+
+ The CAN network device driver interface provides a generic interface
+ to setup, configure and monitor CAN network devices. The user can then
+ configure the CAN device, like setting the bit-timing parameters, via
+ the netlink interface using the program "ip" from the "IPROUTE2"
+ utility suite. The following chapter describes briefly how to use it.
+ Furthermore, the interface uses a common data structure and exports a
+ set of common functions, which all real CAN network device drivers
+ should use. Please have a look to the SJA1000 or MSCAN driver to
+ understand how to use them. The name of the module is can-dev.ko.
+
+ 6.5.1 Netlink interface to set/get devices properties
+
+ The CAN device must be configured via netlink interface. The supported
+ netlink message types are defined and briefly described in
+ "include/linux/can/netlink.h". CAN link support for the program "ip"
+ of the IPROUTE2 utility suite is avaiable and it can be used as shown
+ below:
+
+ - Setting CAN device properties:
+
+ $ ip link set can0 type can help
+ Usage: ip link set DEVICE type can
+ [ bitrate BITRATE [ sample-point SAMPLE-POINT] ] |
+ [ tq TQ prop-seg PROP_SEG phase-seg1 PHASE-SEG1
+ phase-seg2 PHASE-SEG2 [ sjw SJW ] ]
+
+ [ loopback { on | off } ]
+ [ listen-only { on | off } ]
+ [ triple-sampling { on | off } ]
+
+ [ restart-ms TIME-MS ]
+ [ restart ]
+
+ Where: BITRATE := { 1..1000000 }
+ SAMPLE-POINT := { 0.000..0.999 }
+ TQ := { NUMBER }
+ PROP-SEG := { 1..8 }
+ PHASE-SEG1 := { 1..8 }
+ PHASE-SEG2 := { 1..8 }
+ SJW := { 1..4 }
+ RESTART-MS := { 0 | NUMBER }
+
+ - Display CAN device details and statistics:
+
+ $ ip -details -statistics link show can0
+ 2: can0: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc pfifo_fast state UP qlen 10
+ link/can
+ can <TRIPLE-SAMPLING> state ERROR-ACTIVE restart-ms 100
+ bitrate 125000 sample_point 0.875
+ tq 125 prop-seg 6 phase-seg1 7 phase-seg2 2 sjw 1
+ sja1000: tseg1 1..16 tseg2 1..8 sjw 1..4 brp 1..64 brp-inc 1
+ clock 8000000
+ re-started bus-errors arbit-lost error-warn error-pass bus-off
+ 41 17457 0 41 42 41
+ RX: bytes packets errors dropped overrun mcast
+ 140859 17608 17457 0 0 0
+ TX: bytes packets errors dropped carrier collsns
+ 861 112 0 41 0 0
+
+ More info to the above output:
+
+ "<TRIPLE-SAMPLING>"
+ Shows the list of selected CAN controller modes: LOOPBACK,
+ LISTEN-ONLY, or TRIPLE-SAMPLING.
+
+ "state ERROR-ACTIVE"
+ The current state of the CAN controller: "ERROR-ACTIVE",
+ "ERROR-WARNING", "ERROR-PASSIVE", "BUS-OFF" or "STOPPED"
+
+ "restart-ms 100"
+ Automatic restart delay time. If set to a non-zero value, a
+ restart of the CAN controller will be triggered automatically
+ in case of a bus-off condition after the specified delay time
+ in milliseconds. By default it's off.
+
+ "bitrate 125000 sample_point 0.875"
+ Shows the real bit-rate in bits/sec and the sample-point in the
+ range 0.000..0.999. If the calculation of bit-timing parameters
+ is enabled in the kernel (CONFIG_CAN_CALC_BITTIMING=y), the
+ bit-timing can be defined by setting the "bitrate" argument.
+ Optionally the "sample-point" can be specified. By default it's
+ 0.000 assuming CIA-recommended sample-points.
+
+ "tq 125 prop-seg 6 phase-seg1 7 phase-seg2 2 sjw 1"
+ Shows the time quanta in ns, propagation segment, phase buffer
+ segment 1 and 2 and the synchronisation jump width in units of
+ tq. They allow to define the CAN bit-timing in a hardware
+ independent format as proposed by the Bosch CAN 2.0 spec (see
+ chapter 8 of http://www.semiconductors.bosch.de/pdf/can2spec.pdf).
+
+ "sja1000: tseg1 1..16 tseg2 1..8 sjw 1..4 brp 1..64 brp-inc 1
+ clock 8000000"
+ Shows the bit-timing constants of the CAN controller, here the
+ "sja1000". The minimum and maximum values of the time segment 1
+ and 2, the synchronisation jump width in units of tq, the
+ bitrate pre-scaler and the CAN system clock frequency in Hz.
+ These constants could be used for user-defined (non-standard)
+ bit-timing calculation algorithms in user-space.
+
+ "re-started bus-errors arbit-lost error-warn error-pass bus-off"
+ Shows the number of restarts, bus and arbitration lost errors,
+ and the state changes to the error-warning, error-passive and
+ bus-off state. RX overrun errors are listed in the "overrun"
+ field of the standard network statistics.
+
+ 6.5.2 Setting the CAN bit-timing
+
+ The CAN bit-timing parameters can always be defined in a hardware
+ independent format as proposed in the Bosch CAN 2.0 specification
+ specifying the arguments "tq", "prop_seg", "phase_seg1", "phase_seg2"
+ and "sjw":
+
+ $ ip link set canX type can tq 125 prop-seg 6 \
+ phase-seg1 7 phase-seg2 2 sjw 1
+
+ If the kernel option CONFIG_CAN_CALC_BITTIMING is enabled, CIA
+ recommended CAN bit-timing parameters will be calculated if the bit-
+ rate is specified with the argument "bitrate":
+
+ $ ip link set canX type can bitrate 125000
+
+ Note that this works fine for the most common CAN controllers with
+ standard bit-rates but may *fail* for exotic bit-rates or CAN system
+ clock frequencies. Disabling CONFIG_CAN_CALC_BITTIMING saves some
+ space and allows user-space tools to solely determine and set the
+ bit-timing parameters. The CAN controller specific bit-timing
+ constants can be used for that purpose. They are listed by the
+ following command:
+
+ $ ip -details link show can0
+ ...
+ sja1000: clock 8000000 tseg1 1..16 tseg2 1..8 sjw 1..4 brp 1..64 brp-inc 1
+
+ 6.5.3 Starting and stopping the CAN network device
+
+ A CAN network device is started or stopped as usual with the command
+ "ifconfig canX up/down" or "ip link set canX up/down". Be aware that
+ you *must* define proper bit-timing parameters for real CAN devices
+ before you can start it to avoid error-prone default settings:
+
+ $ ip link set canX up type can bitrate 125000
+
+ A device may enter the "bus-off" state if too much errors occurred on
+ the CAN bus. Then no more messages are received or sent. An automatic
+ bus-off recovery can be enabled by setting the "restart-ms" to a
+ non-zero value, e.g.:
+
+ $ ip link set canX type can restart-ms 100
+
+ Alternatively, the application may realize the "bus-off" condition
+ by monitoring CAN error frames and do a restart when appropriate with
+ the command:
+
+ $ ip link set canX type can restart
+
+ Note that a restart will also create a CAN error frame (see also
+ chapter 3.4).
- On the project website http://developer.berlios.de/projects/socketcan
- there are different drivers available:
+ 6.6 Supported CAN hardware
- vcan: Virtual CAN interface driver (if no real hardware is available)
- sja1000: Philips SJA1000 CAN controller (recommended)
- i82527: Intel i82527 CAN controller
- mscan: Motorola/Freescale CAN controller (e.g. inside SOC MPC5200)
- ccan: CCAN controller core (e.g. inside SOC h7202)
- slcan: For a bunch of CAN adaptors that are attached via a
- serial line ASCII protocol (for serial / USB adaptors)
+ Please check the "Kconfig" file in "drivers/net/can" to get an actual
+ list of the support CAN hardware. On the Socket CAN project website
+ (see chapter 7) there might be further drivers available, also for
+ older kernel versions.
- Additionally the different CAN adaptors (ISA/PCI/PCMCIA/USB/Parport)
- from PEAK Systemtechnik support the CAN netdevice driver model
- since Linux driver v6.0: http://www.peak-system.com/linux/index.htm
+7. Socket CAN resources
+-----------------------
- Please check the Mailing Lists on the berlios OSS project website.
+ You can find further resources for Socket CAN like user space tools,
+ support for old kernel versions, more drivers, mailing lists, etc.
+ at the BerliOS OSS project website for Socket CAN:
- 6.6 todo
+ http://developer.berlios.de/projects/socketcan
- The configuration interface for CAN network drivers is still an open
- issue that has not been finalized in the socketcan project. Also the
- idea of having a library module (candev.ko) that holds functions
- that are needed by all CAN netdevices is not ready to ship.
- Your contribution is welcome.
+ If you have questions, bug fixes, etc., don't hesitate to post them to
+ the Socketcan-Users mailing list. But please search the archives first.
-7. Credits
+8. Credits
----------
- Oliver Hartkopp (PF_CAN core, filters, drivers, bcm)
+ Oliver Hartkopp (PF_CAN core, filters, drivers, bcm, SJA1000 driver)
Urs Thuermann (PF_CAN core, kernel integration, socket interfaces, raw, vcan)
Jan Kizka (RT-SocketCAN core, Socket-API reconciliation)
- Wolfgang Grandegger (RT-SocketCAN core & drivers, Raw Socket-API reviews)
+ Wolfgang Grandegger (RT-SocketCAN core & drivers, Raw Socket-API reviews,
+ CAN device driver interface, MSCAN driver)
Robert Schwebel (design reviews, PTXdist integration)
Marc Kleine-Budde (design reviews, Kernel 2.6 cleanups, drivers)
Benedikt Spranger (reviews)
Thomas Gleixner (LKML reviews, coding style, posting hints)
- Andrey Volkov (kernel subtree structure, ioctls, mscan driver)
+ Andrey Volkov (kernel subtree structure, ioctls, MSCAN driver)
Matthias Brukner (first SJA1000 CAN netdevice implementation Q2/2003)
Klaus Hitschler (PEAK driver integration)
Uwe Koppe (CAN netdevices with PF_PACKET approach)
Michael Schulze (driver layer loopback requirement, RT CAN drivers review)
+ Pavel Pisa (Bit-timing calculation)
+ Sascha Hauer (SJA1000 platform driver)
+ Sebastian Haas (SJA1000 EMS PCI driver)
+ Markus Plessing (SJA1000 EMS PCI driver)
+ Per Dalen (SJA1000 Kvaser PCI driver)
+ Sam Ravnborg (reviews, coding style, kbuild help)
diff --git a/Documentation/networking/dm9000.txt b/Documentation/networking/dm9000.txt
index 65df3dea556..5552e2e575c 100644
--- a/Documentation/networking/dm9000.txt
+++ b/Documentation/networking/dm9000.txt
@@ -129,7 +129,7 @@ PHY Link state polling
----------------------
The driver keeps track of the link state and informs the network core
-about link (carrier) availablilty. This is managed by several methods
+about link (carrier) availability. This is managed by several methods
depending on the version of the chip and on which PHY is being used.
For the internal PHY, the original (and currently default) method is
diff --git a/Documentation/networking/ieee802154.txt b/Documentation/networking/ieee802154.txt
new file mode 100644
index 00000000000..a0280ad2edc
--- /dev/null
+++ b/Documentation/networking/ieee802154.txt
@@ -0,0 +1,76 @@
+
+ Linux IEEE 802.15.4 implementation
+
+
+Introduction
+============
+
+The Linux-ZigBee project goal is to provide complete implementation
+of IEEE 802.15.4 / ZigBee / 6LoWPAN protocols. IEEE 802.15.4 is a stack
+of protocols for organizing Low-Rate Wireless Personal Area Networks.
+
+Currently only IEEE 802.15.4 layer is implemented. We have choosen
+to use plain Berkeley socket API, the generic Linux networking stack
+to transfer IEEE 802.15.4 messages and a special protocol over genetlink
+for configuration/management
+
+
+Socket API
+==========
+
+int sd = socket(PF_IEEE802154, SOCK_DGRAM, 0);
+.....
+
+The address family, socket addresses etc. are defined in the
+include/net/ieee802154/af_ieee802154.h header or in the special header
+in our userspace package (see either linux-zigbee sourceforge download page
+or git tree at git://linux-zigbee.git.sourceforge.net/gitroot/linux-zigbee).
+
+One can use SOCK_RAW for passing raw data towards device xmit function. YMMV.
+
+
+MLME - MAC Level Management
+============================
+
+Most of IEEE 802.15.4 MLME interfaces are directly mapped on netlink commands.
+See the include/net/ieee802154/nl802154.h header. Our userspace tools package
+(see above) provides CLI configuration utility for radio interfaces and simple
+coordinator for IEEE 802.15.4 networks as an example users of MLME protocol.
+
+
+Kernel side
+=============
+
+Like with WiFi, there are several types of devices implementing IEEE 802.15.4.
+1) 'HardMAC'. The MAC layer is implemented in the device itself, the device
+ exports MLME and data API.
+2) 'SoftMAC' or just radio. These types of devices are just radio transceivers
+ possibly with some kinds of acceleration like automatic CRC computation and
+ comparation, automagic ACK handling, address matching, etc.
+
+Those types of devices require different approach to be hooked into Linux kernel.
+
+
+HardMAC
+=======
+
+See the header include/net/ieee802154/netdevice.h. You have to implement Linux
+net_device, with .type = ARPHRD_IEEE802154. Data is exchanged with socket family
+code via plain sk_buffs. The control block of sk_buffs will contain additional
+info as described in the struct ieee802154_mac_cb.
+
+To hook the MLME interface you have to populate the ml_priv field of your
+net_device with a pointer to struct ieee802154_mlme_ops instance. All fields are
+required.
+
+We provide an example of simple HardMAC driver at drivers/ieee802154/fakehard.c
+
+
+SoftMAC
+=======
+
+We are going to provide intermediate layer impelementing IEEE 802.15.4 MAC
+in software. This is currently WIP.
+
+See header include/net/ieee802154/mac802154.h and several drivers in
+drivers/ieee802154/
diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
index b121c5db707..8be76235fe6 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -168,7 +168,16 @@ tcp_dsack - BOOLEAN
Allows TCP to send "duplicate" SACKs.
tcp_ecn - BOOLEAN
- Enable Explicit Congestion Notification in TCP.
+ Enable Explicit Congestion Notification (ECN) in TCP. ECN is only
+ used when both ends of the TCP flow support it. It is useful to
+ avoid losses due to congestion (when the bottleneck router supports
+ ECN).
+ Possible values are:
+ 0 disable ECN
+ 1 ECN enabled
+ 2 Only server-side ECN enabled. If the other end does
+ not support ECN, behavior is like with ECN disabled.
+ Default: 2
tcp_fack - BOOLEAN
Enable FACK congestion avoidance and fast retransmission.
@@ -1048,6 +1057,13 @@ disable_ipv6 - BOOLEAN
address.
Default: FALSE (enable IPv6 operation)
+ When this value is changed from 1 to 0 (IPv6 is being enabled),
+ it will dynamically create a link-local address on the given
+ interface and start Duplicate Address Detection, if necessary.
+
+ When this value is changed from 0 to 1 (IPv6 is being disabled),
+ it will dynamically delete all address on the given interface.
+
accept_dad - INTEGER
Whether to accept DAD (Duplicate Address Detection).
0: Disable DAD
diff --git a/Documentation/networking/ipv6.txt b/Documentation/networking/ipv6.txt
index 268e5c103dd..9fd7e21296c 100644
--- a/Documentation/networking/ipv6.txt
+++ b/Documentation/networking/ipv6.txt
@@ -33,3 +33,40 @@ disable
A reboot is required to enable IPv6.
+autoconf
+
+ Specifies whether to enable IPv6 address autoconfiguration
+ on all interfaces. This might be used when one does not wish
+ for addresses to be automatically generated from prefixes
+ received in Router Advertisements.
+
+ The possible values and their effects are:
+
+ 0
+ IPv6 address autoconfiguration is disabled on all interfaces.
+
+ Only the IPv6 loopback address (::1) and link-local addresses
+ will be added to interfaces.
+
+ 1
+ IPv6 address autoconfiguration is enabled on all interfaces.
+
+ This is the default value.
+
+disable_ipv6
+
+ Specifies whether to disable IPv6 on all interfaces.
+ This might be used when no IPv6 addresses are desired.
+
+ The possible values and their effects are:
+
+ 0
+ IPv6 is enabled on all interfaces.
+
+ This is the default value.
+
+ 1
+ IPv6 is disabled on all interfaces.
+
+ No IPv6 addresses will be added to interfaces.
+
diff --git a/Documentation/networking/l2tp.txt b/Documentation/networking/l2tp.txt
index 2451f551c50..63214b280e0 100644
--- a/Documentation/networking/l2tp.txt
+++ b/Documentation/networking/l2tp.txt
@@ -158,7 +158,7 @@ Sample Userspace Code
}
return 0;
-Miscellanous
+Miscellaneous
============
The PPPoL2TP driver was developed as part of the OpenL2TP project by
diff --git a/Documentation/networking/mac80211-injection.txt b/Documentation/networking/mac80211-injection.txt
index 84906ef3ed6..b30e81ad530 100644
--- a/Documentation/networking/mac80211-injection.txt
+++ b/Documentation/networking/mac80211-injection.txt
@@ -12,38 +12,22 @@ following format:
The radiotap format is discussed in
./Documentation/networking/radiotap-headers.txt.
-Despite 13 radiotap argument types are currently defined, most only make sense
+Despite many radiotap parameters being currently defined, most only make sense
to appear on received packets. The following information is parsed from the
radiotap headers and used to control injection:
- * IEEE80211_RADIOTAP_RATE
-
- rate in 500kbps units, automatic if invalid or not present
-
-
- * IEEE80211_RADIOTAP_ANTENNA
-
- antenna to use, automatic if not present
-
-
- * IEEE80211_RADIOTAP_DBM_TX_POWER
-
- transmit power in dBm, automatic if not present
-
-
* IEEE80211_RADIOTAP_FLAGS
IEEE80211_RADIOTAP_F_FCS: FCS will be removed and recalculated
IEEE80211_RADIOTAP_F_WEP: frame will be encrypted if key available
IEEE80211_RADIOTAP_F_FRAG: frame will be fragmented if longer than the
- current fragmentation threshold. Note that
- this flag is only reliable when software
- fragmentation is enabled)
+ current fragmentation threshold.
+
The injection code can also skip all other currently defined radiotap fields
facilitating replay of captured radiotap headers directly.
-Here is an example valid radiotap header defining these three parameters
+Here is an example valid radiotap header defining some parameters
0x00, 0x00, // <-- radiotap version
0x0b, 0x00, // <- radiotap header length
@@ -72,8 +56,8 @@ interface), along the following lines:
...
r = pcap_inject(ppcap, u8aSendBuffer, nLength);
-You can also find sources for a complete inject test applet here:
+You can also find a link to a complete inject application here:
-http://penumbra.warmcat.com/_twk/tiki-index.php?page=packetspammer
+http://wireless.kernel.org/en/users/Documentation/packetspammer
Andy Green <andy@warmcat.com>
diff --git a/Documentation/networking/netdevices.txt b/Documentation/networking/netdevices.txt
index a2ab6a0b116..87b3d15f523 100644
--- a/Documentation/networking/netdevices.txt
+++ b/Documentation/networking/netdevices.txt
@@ -74,7 +74,7 @@ dev->hard_start_xmit:
for this and return NETDEV_TX_LOCKED when the spin lock fails.
The locking there should also properly protect against
set_multicast_list. Note that the use of NETIF_F_LLTX is deprecated.
- Dont use it for new drivers.
+ Don't use it for new drivers.
Context: Process with BHs disabled or BH (timer),
will be called with interrupts disabled by netconsole.
diff --git a/Documentation/networking/operstates.txt b/Documentation/networking/operstates.txt
index c9074f9b78b..1a77a3cfae5 100644
--- a/Documentation/networking/operstates.txt
+++ b/Documentation/networking/operstates.txt
@@ -38,9 +38,6 @@ ifinfomsg::if_flags & IFF_LOWER_UP:
ifinfomsg::if_flags & IFF_DORMANT:
Driver has signaled netif_dormant_on()
-These interface flags can also be queried without netlink using the
-SIOCGIFFLAGS ioctl.
-
TLV IFLA_OPERSTATE
contains RFC2863 state of the interface in numeric representation:
diff --git a/Documentation/networking/packet_mmap.txt b/Documentation/networking/packet_mmap.txt
index 07c53d59603..a22fd85e379 100644
--- a/Documentation/networking/packet_mmap.txt
+++ b/Documentation/networking/packet_mmap.txt
@@ -4,16 +4,18 @@
This file documents the CONFIG_PACKET_MMAP option available with the PACKET
socket interface on 2.4 and 2.6 kernels. This type of sockets is used for
-capture network traffic with utilities like tcpdump or any other that uses
-the libpcap library.
-
-You can find the latest version of this document at
+capture network traffic with utilities like tcpdump or any other that needs
+raw access to network interface.
+You can find the latest version of this document at:
http://pusa.uv.es/~ulisses/packet_mmap/
-Please send me your comments to
+Howto can be found at:
+ http://wiki.gnu-log.net (packet_mmap)
+Please send your comments to
Ulisses Alonso Camaró <uaca@i.hate.spam.alumni.uv.es>
+ Johann Baudy <johann.baudy@gnu-log.net>
-------------------------------------------------------------------------------
+ Why use PACKET_MMAP
@@ -25,19 +27,24 @@ to capture each packet, it requires two if you want to get packet's
timestamp (like libpcap always does).
In the other hand PACKET_MMAP is very efficient. PACKET_MMAP provides a size
-configurable circular buffer mapped in user space. This way reading packets just
-needs to wait for them, most of the time there is no need to issue a single
-system call. By using a shared buffer between the kernel and the user
-also has the benefit of minimizing packet copies.
-
-It's fine to use PACKET_MMAP to improve the performance of the capture process,
-but it isn't everything. At least, if you are capturing at high speeds (this
-is relative to the cpu speed), you should check if the device driver of your
-network interface card supports some sort of interrupt load mitigation or
-(even better) if it supports NAPI, also make sure it is enabled.
+configurable circular buffer mapped in user space that can be used to either
+send or receive packets. This way reading packets just needs to wait for them,
+most of the time there is no need to issue a single system call. Concerning
+transmission, multiple packets can be sent through one system call to get the
+highest bandwidth.
+By using a shared buffer between the kernel and the user also has the benefit
+of minimizing packet copies.
+
+It's fine to use PACKET_MMAP to improve the performance of the capture and
+transmission process, but it isn't everything. At least, if you are capturing
+at high speeds (this is relative to the cpu speed), you should check if the
+device driver of your network interface card supports some sort of interrupt
+load mitigation or (even better) if it supports NAPI, also make sure it is
+enabled. For transmission, check the MTU (Maximum Transmission Unit) used and
+supported by devices of your network.
--------------------------------------------------------------------------------
-+ How to use CONFIG_PACKET_MMAP
++ How to use CONFIG_PACKET_MMAP to improve capture process
--------------------------------------------------------------------------------
From the user standpoint, you should use the higher level libpcap library, which
@@ -57,7 +64,7 @@ the low level details or want to improve libpcap by including PACKET_MMAP
support.
--------------------------------------------------------------------------------
-+ How to use CONFIG_PACKET_MMAP directly
++ How to use CONFIG_PACKET_MMAP directly to improve capture process
--------------------------------------------------------------------------------
From the system calls stand point, the use of PACKET_MMAP involves
@@ -66,6 +73,7 @@ the following process:
[setup] socket() -------> creation of the capture socket
setsockopt() ---> allocation of the circular buffer (ring)
+ option: PACKET_RX_RING
mmap() ---------> mapping of the allocated buffer to the
user process
@@ -97,13 +105,75 @@ also the mapping of the circular buffer in the user process and
the use of this buffer.
--------------------------------------------------------------------------------
++ How to use CONFIG_PACKET_MMAP directly to improve transmission process
+--------------------------------------------------------------------------------
+Transmission process is similar to capture as shown below.
+
+[setup] socket() -------> creation of the transmission socket
+ setsockopt() ---> allocation of the circular buffer (ring)
+ option: PACKET_TX_RING
+ bind() ---------> bind transmission socket with a network interface
+ mmap() ---------> mapping of the allocated buffer to the
+ user process
+
+[transmission] poll() ---------> wait for free packets (optional)
+ send() ---------> send all packets that are set as ready in
+ the ring
+ The flag MSG_DONTWAIT can be used to return
+ before end of transfer.
+
+[shutdown] close() --------> destruction of the transmission socket and
+ deallocation of all associated resources.
+
+Binding the socket to your network interface is mandatory (with zero copy) to
+know the header size of frames used in the circular buffer.
+
+As capture, each frame contains two parts:
+
+ --------------------
+| struct tpacket_hdr | Header. It contains the status of
+| | of this frame
+|--------------------|
+| data buffer |
+. . Data that will be sent over the network interface.
+. .
+ --------------------
+
+ bind() associates the socket to your network interface thanks to
+ sll_ifindex parameter of struct sockaddr_ll.
+
+ Initialization example:
+
+ struct sockaddr_ll my_addr;
+ struct ifreq s_ifr;
+ ...
+
+ strncpy (s_ifr.ifr_name, "eth0", sizeof(s_ifr.ifr_name));
+
+ /* get interface index of eth0 */
+ ioctl(this->socket, SIOCGIFINDEX, &s_ifr);
+
+ /* fill sockaddr_ll struct to prepare binding */
+ my_addr.sll_family = AF_PACKET;
+ my_addr.sll_protocol = ETH_P_ALL;
+ my_addr.sll_ifindex = s_ifr.ifr_ifindex;
+
+ /* bind socket to eth0 */
+ bind(this->socket, (struct sockaddr *)&my_addr, sizeof(struct sockaddr_ll));
+
+ A complete tutorial is available at: http://wiki.gnu-log.net/
+
+--------------------------------------------------------------------------------
+ PACKET_MMAP settings
--------------------------------------------------------------------------------
To setup PACKET_MMAP from user level code is done with a call like
+ - Capture process
setsockopt(fd, SOL_PACKET, PACKET_RX_RING, (void *) &req, sizeof(req))
+ - Transmission process
+ setsockopt(fd, SOL_PACKET, PACKET_TX_RING, (void *) &req, sizeof(req))
The most significant argument in the previous call is the req parameter,
this parameter must to have the following structure:
@@ -117,11 +187,11 @@ this parameter must to have the following structure:
};
This structure is defined in /usr/include/linux/if_packet.h and establishes a
-circular buffer (ring) of unswappable memory mapped in the capture process.
+circular buffer (ring) of unswappable memory.
Being mapped in the capture process allows reading the captured frames and
related meta-information like timestamps without requiring a system call.
-Captured frames are grouped in blocks. Each block is a physically contiguous
+Frames are grouped in blocks. Each block is a physically contiguous
region of memory and holds tp_block_size/tp_frame_size frames. The total number
of blocks is tp_block_nr. Note that tp_frame_nr is a redundant parameter because
@@ -336,6 +406,7 @@ struct tpacket_hdr). If this field is 0 means that the frame is ready
to be used for the kernel, If not, there is a frame the user can read
and the following flags apply:
++++ Capture process:
from include/linux/if_packet.h
#define TP_STATUS_COPY 2
@@ -391,6 +462,37 @@ packets are in the ring:
It doesn't incur in a race condition to first check the status value and
then poll for frames.
+
+++ Transmission process
+Those defines are also used for transmission:
+
+ #define TP_STATUS_AVAILABLE 0 // Frame is available
+ #define TP_STATUS_SEND_REQUEST 1 // Frame will be sent on next send()
+ #define TP_STATUS_SENDING 2 // Frame is currently in transmission
+ #define TP_STATUS_WRONG_FORMAT 4 // Frame format is not correct
+
+First, the kernel initializes all frames to TP_STATUS_AVAILABLE. To send a
+packet, the user fills a data buffer of an available frame, sets tp_len to
+current data buffer size and sets its status field to TP_STATUS_SEND_REQUEST.
+This can be done on multiple frames. Once the user is ready to transmit, it
+calls send(). Then all buffers with status equal to TP_STATUS_SEND_REQUEST are
+forwarded to the network device. The kernel updates each status of sent
+frames with TP_STATUS_SENDING until the end of transfer.
+At the end of each transfer, buffer status returns to TP_STATUS_AVAILABLE.
+
+ header->tp_len = in_i_size;
+ header->tp_status = TP_STATUS_SEND_REQUEST;
+ retval = send(this->socket, NULL, 0, 0);
+
+The user can also use poll() to check if a buffer is available:
+(status == TP_STATUS_SENDING)
+
+ struct pollfd pfd;
+ pfd.fd = fd;
+ pfd.revents = 0;
+ pfd.events = POLLOUT;
+ retval = poll(&pfd, 1, timeout);
+
--------------------------------------------------------------------------------
+ THANKS
--------------------------------------------------------------------------------
diff --git a/Documentation/networking/phonet.txt b/Documentation/networking/phonet.txt
index 6a07e45d4a9..6e8ce09f9c7 100644
--- a/Documentation/networking/phonet.txt
+++ b/Documentation/networking/phonet.txt
@@ -36,7 +36,7 @@ Phonet packets have a common header as follows:
On Linux, the link-layer header includes the pn_media byte (see below).
The next 7 bytes are part of the network-layer header.
-The device ID is split: the 6 higher-order bits consitute the device
+The device ID is split: the 6 higher-order bits constitute the device
address, while the 2 lower-order bits are used for multiplexing, as are
the 8-bit object identifiers. As such, Phonet can be considered as a
network layer with 6 bits of address space and 10 bits for transport
diff --git a/Documentation/networking/regulatory.txt b/Documentation/networking/regulatory.txt
index dcf31648414..eaa1a25946c 100644
--- a/Documentation/networking/regulatory.txt
+++ b/Documentation/networking/regulatory.txt
@@ -89,7 +89,7 @@ added to this document when its support is enabled.
Device drivers who provide their own built regulatory domain
do not need a callback as the channels registered by them are
the only ones that will be allowed and therefore *additional*
-cannels cannot be enabled.
+channels cannot be enabled.
Example code - drivers hinting an alpha2:
------------------------------------------
diff --git a/Documentation/power/devices.txt b/Documentation/power/devices.txt
index 421e7d00ffd..c9abbd86bc1 100644
--- a/Documentation/power/devices.txt
+++ b/Documentation/power/devices.txt
@@ -75,9 +75,6 @@ may need to apply in domain-specific ways to their devices:
struct bus_type {
...
int (*suspend)(struct device *dev, pm_message_t state);
- int (*suspend_late)(struct device *dev, pm_message_t state);
-
- int (*resume_early)(struct device *dev);
int (*resume)(struct device *dev);
};
@@ -226,20 +223,7 @@ The phases are seen by driver notifications issued in this order:
This call should handle parts of device suspend logic that require
sleeping. It probably does work to quiesce the device which hasn't
- been abstracted into class.suspend() or bus.suspend_late().
-
- 3 bus.suspend_late(dev, message) is called with IRQs disabled, and
- with only one CPU active. Until the bus.resume_early() phase
- completes (see later), IRQs are not enabled again. This method
- won't be exposed by all busses; for message based busses like USB,
- I2C, or SPI, device interactions normally require IRQs. This bus
- call may be morphed into a driver call with bus-specific parameters.
-
- This call might save low level hardware state that might otherwise
- be lost in the upcoming low power state, and actually put the
- device into a low power state ... so that in some cases the device
- may stay partly usable until this late. This "late" call may also
- help when coping with hardware that behaves badly.
+ been abstracted into class.suspend().
The pm_message_t parameter is currently used to refine those semantics
(described later).
@@ -351,19 +335,11 @@ devices processing each phase's calls before the next phase begins.
The phases are seen by driver notifications issued in this order:
- 1 bus.resume_early(dev) is called with IRQs disabled, and with
- only one CPU active. As with bus.suspend_late(), this method
- won't be supported on busses that require IRQs in order to
- interact with devices.
-
- This reverses the effects of bus.suspend_late().
-
- 2 bus.resume(dev) is called next. This may be morphed into a device
- driver call with bus-specific parameters; implementations may sleep.
-
- This reverses the effects of bus.suspend().
+ 1 bus.resume(dev) reverses the effects of bus.suspend(). This may
+ be morphed into a device driver call with bus-specific parameters;
+ implementations may sleep.
- 3 class.resume(dev) is called for devices associated with a class
+ 2 class.resume(dev) is called for devices associated with a class
that has such a method. Implementations may sleep.
This reverses the effects of class.suspend(), and would usually
diff --git a/Documentation/power/regulator/consumer.txt b/Documentation/power/regulator/consumer.txt
index 82b7a43aadb..5f83fd24ea8 100644
--- a/Documentation/power/regulator/consumer.txt
+++ b/Documentation/power/regulator/consumer.txt
@@ -178,5 +178,5 @@ Consumers can uregister interest by calling :-
int regulator_unregister_notifier(struct regulator *regulator,
struct notifier_block *nb);
-Regulators use the kernel notifier framework to send event to thier interested
+Regulators use the kernel notifier framework to send event to their interested
consumers.
diff --git a/Documentation/power/regulator/overview.txt b/Documentation/power/regulator/overview.txt
index bdcb332bd7f..0cded696ca0 100644
--- a/Documentation/power/regulator/overview.txt
+++ b/Documentation/power/regulator/overview.txt
@@ -119,7 +119,7 @@ Some terms used in this document:-
battery power, USB power)
Regulator Domains: is the new current limit within the
- regulator operating parameters for input/ouput voltage.
+ regulator operating parameters for input/output voltage.
If the regulator request passes all the constraint tests
then the new regulator value is applied.
diff --git a/Documentation/power/s2ram.txt b/Documentation/power/s2ram.txt
index 2ebdc6091ce..514b94fc931 100644
--- a/Documentation/power/s2ram.txt
+++ b/Documentation/power/s2ram.txt
@@ -63,7 +63,7 @@ hardware during resume operations where a value can be set that will
survive a reboot.
Consequence is that after a resume (even if it is successful) your system
-clock will have a value corresponding to the magic mumber instead of the
+clock will have a value corresponding to the magic number instead of the
correct date/time! It is therefore advisable to use a program like ntp-date
or rdate to reset the correct date/time from an external time source when
using this trace option.
diff --git a/Documentation/power/userland-swsusp.txt b/Documentation/power/userland-swsusp.txt
index 7b99636564c..b967cd9137d 100644
--- a/Documentation/power/userland-swsusp.txt
+++ b/Documentation/power/userland-swsusp.txt
@@ -109,7 +109,7 @@ unfreeze user space processes frozen by SNAPSHOT_UNFREEZE if they are
still frozen when the device is being closed).
Currently it is assumed that the userland utilities reading/writing the
-snapshot image from/to the kernel will use a swap parition, called the resume
+snapshot image from/to the kernel will use a swap partition, called the resume
partition, or a swap file as storage space (if a swap file is used, the resume
partition is the partition that holds this file). However, this is not really
required, as they can use, for example, a special (blank) suspend partition or
diff --git a/Documentation/powerpc/booting-without-of.txt b/Documentation/powerpc/booting-without-of.txt
index d16b7a1c379..79f533f38c6 100644
--- a/Documentation/powerpc/booting-without-of.txt
+++ b/Documentation/powerpc/booting-without-of.txt
@@ -1238,1122 +1238,7 @@ descriptions for the SOC devices for which new nodes have been
defined; this list will expand as more and more SOC-containing
platforms are moved over to use the flattened-device-tree model.
- a) PHY nodes
-
- Required properties:
-
- - device_type : Should be "ethernet-phy"
- - interrupts : <a b> where a is the interrupt number and b is a
- field that represents an encoding of the sense and level
- information for the interrupt. This should be encoded based on
- the information in section 2) depending on the type of interrupt
- controller you have.
- - interrupt-parent : the phandle for the interrupt controller that
- services interrupts for this device.
- - reg : The ID number for the phy, usually a small integer
- - linux,phandle : phandle for this node; likely referenced by an
- ethernet controller node.
-
-
- Example:
-
- ethernet-phy@0 {
- linux,phandle = <2452000>
- interrupt-parent = <40000>;
- interrupts = <35 1>;
- reg = <0>;
- device_type = "ethernet-phy";
- };
-
-
- b) Interrupt controllers
-
- Some SOC devices contain interrupt controllers that are different
- from the standard Open PIC specification. The SOC device nodes for
- these types of controllers should be specified just like a standard
- OpenPIC controller. Sense and level information should be encoded
- as specified in section 2) of this chapter for each device that
- specifies an interrupt.
-
- Example :
-
- pic@40000 {
- linux,phandle = <40000>;
- interrupt-controller;
- #address-cells = <0>;
- reg = <40000 40000>;
- compatible = "chrp,open-pic";
- device_type = "open-pic";
- };
-
- c) 4xx/Axon EMAC ethernet nodes
-
- The EMAC ethernet controller in IBM and AMCC 4xx chips, and also
- the Axon bridge. To operate this needs to interact with a ths
- special McMAL DMA controller, and sometimes an RGMII or ZMII
- interface. In addition to the nodes and properties described
- below, the node for the OPB bus on which the EMAC sits must have a
- correct clock-frequency property.
-
- i) The EMAC node itself
-
- Required properties:
- - device_type : "network"
-
- - compatible : compatible list, contains 2 entries, first is
- "ibm,emac-CHIP" where CHIP is the host ASIC (440gx,
- 405gp, Axon) and second is either "ibm,emac" or
- "ibm,emac4". For Axon, thus, we have: "ibm,emac-axon",
- "ibm,emac4"
- - interrupts : <interrupt mapping for EMAC IRQ and WOL IRQ>
- - interrupt-parent : optional, if needed for interrupt mapping
- - reg : <registers mapping>
- - local-mac-address : 6 bytes, MAC address
- - mal-device : phandle of the associated McMAL node
- - mal-tx-channel : 1 cell, index of the tx channel on McMAL associated
- with this EMAC
- - mal-rx-channel : 1 cell, index of the rx channel on McMAL associated
- with this EMAC
- - cell-index : 1 cell, hardware index of the EMAC cell on a given
- ASIC (typically 0x0 and 0x1 for EMAC0 and EMAC1 on
- each Axon chip)
- - max-frame-size : 1 cell, maximum frame size supported in bytes
- - rx-fifo-size : 1 cell, Rx fifo size in bytes for 10 and 100 Mb/sec
- operations.
- For Axon, 2048
- - tx-fifo-size : 1 cell, Tx fifo size in bytes for 10 and 100 Mb/sec
- operations.
- For Axon, 2048.
- - fifo-entry-size : 1 cell, size of a fifo entry (used to calculate
- thresholds).
- For Axon, 0x00000010
- - mal-burst-size : 1 cell, MAL burst size (used to calculate thresholds)
- in bytes.
- For Axon, 0x00000100 (I think ...)
- - phy-mode : string, mode of operations of the PHY interface.
- Supported values are: "mii", "rmii", "smii", "rgmii",
- "tbi", "gmii", rtbi", "sgmii".
- For Axon on CAB, it is "rgmii"
- - mdio-device : 1 cell, required iff using shared MDIO registers
- (440EP). phandle of the EMAC to use to drive the
- MDIO lines for the PHY used by this EMAC.
- - zmii-device : 1 cell, required iff connected to a ZMII. phandle of
- the ZMII device node
- - zmii-channel : 1 cell, required iff connected to a ZMII. Which ZMII
- channel or 0xffffffff if ZMII is only used for MDIO.
- - rgmii-device : 1 cell, required iff connected to an RGMII. phandle
- of the RGMII device node.
- For Axon: phandle of plb5/plb4/opb/rgmii
- - rgmii-channel : 1 cell, required iff connected to an RGMII. Which
- RGMII channel is used by this EMAC.
- Fox Axon: present, whatever value is appropriate for each
- EMAC, that is the content of the current (bogus) "phy-port"
- property.
-
- Optional properties:
- - phy-address : 1 cell, optional, MDIO address of the PHY. If absent,
- a search is performed.
- - phy-map : 1 cell, optional, bitmap of addresses to probe the PHY
- for, used if phy-address is absent. bit 0x00000001 is
- MDIO address 0.
- For Axon it can be absent, thouugh my current driver
- doesn't handle phy-address yet so for now, keep
- 0x00ffffff in it.
- - rx-fifo-size-gige : 1 cell, Rx fifo size in bytes for 1000 Mb/sec
- operations (if absent the value is the same as
- rx-fifo-size). For Axon, either absent or 2048.
- - tx-fifo-size-gige : 1 cell, Tx fifo size in bytes for 1000 Mb/sec
- operations (if absent the value is the same as
- tx-fifo-size). For Axon, either absent or 2048.
- - tah-device : 1 cell, optional. If connected to a TAH engine for
- offload, phandle of the TAH device node.
- - tah-channel : 1 cell, optional. If appropriate, channel used on the
- TAH engine.
-
- Example:
-
- EMAC0: ethernet@40000800 {
- device_type = "network";
- compatible = "ibm,emac-440gp", "ibm,emac";
- interrupt-parent = <&UIC1>;
- interrupts = <1c 4 1d 4>;
- reg = <40000800 70>;
- local-mac-address = [00 04 AC E3 1B 1E];
- mal-device = <&MAL0>;
- mal-tx-channel = <0 1>;
- mal-rx-channel = <0>;
- cell-index = <0>;
- max-frame-size = <5dc>;
- rx-fifo-size = <1000>;
- tx-fifo-size = <800>;
- phy-mode = "rmii";
- phy-map = <00000001>;
- zmii-device = <&ZMII0>;
- zmii-channel = <0>;
- };
-
- ii) McMAL node
-
- Required properties:
- - device_type : "dma-controller"
- - compatible : compatible list, containing 2 entries, first is
- "ibm,mcmal-CHIP" where CHIP is the host ASIC (like
- emac) and the second is either "ibm,mcmal" or
- "ibm,mcmal2".
- For Axon, "ibm,mcmal-axon","ibm,mcmal2"
- - interrupts : <interrupt mapping for the MAL interrupts sources:
- 5 sources: tx_eob, rx_eob, serr, txde, rxde>.
- For Axon: This is _different_ from the current
- firmware. We use the "delayed" interrupts for txeob
- and rxeob. Thus we end up with mapping those 5 MPIC
- interrupts, all level positive sensitive: 10, 11, 32,
- 33, 34 (in decimal)
- - dcr-reg : < DCR registers range >
- - dcr-parent : if needed for dcr-reg
- - num-tx-chans : 1 cell, number of Tx channels
- - num-rx-chans : 1 cell, number of Rx channels
-
- iii) ZMII node
-
- Required properties:
- - compatible : compatible list, containing 2 entries, first is
- "ibm,zmii-CHIP" where CHIP is the host ASIC (like
- EMAC) and the second is "ibm,zmii".
- For Axon, there is no ZMII node.
- - reg : <registers mapping>
-
- iv) RGMII node
-
- Required properties:
- - compatible : compatible list, containing 2 entries, first is
- "ibm,rgmii-CHIP" where CHIP is the host ASIC (like
- EMAC) and the second is "ibm,rgmii".
- For Axon, "ibm,rgmii-axon","ibm,rgmii"
- - reg : <registers mapping>
- - revision : as provided by the RGMII new version register if
- available.
- For Axon: 0x0000012a
-
- d) Xilinx IP cores
-
- The Xilinx EDK toolchain ships with a set of IP cores (devices) for use
- in Xilinx Spartan and Virtex FPGAs. The devices cover the whole range
- of standard device types (network, serial, etc.) and miscellanious
- devices (gpio, LCD, spi, etc). Also, since these devices are
- implemented within the fpga fabric every instance of the device can be
- synthesised with different options that change the behaviour.
-
- Each IP-core has a set of parameters which the FPGA designer can use to
- control how the core is synthesized. Historically, the EDK tool would
- extract the device parameters relevant to device drivers and copy them
- into an 'xparameters.h' in the form of #define symbols. This tells the
- device drivers how the IP cores are configured, but it requres the kernel
- to be recompiled every time the FPGA bitstream is resynthesized.
-
- The new approach is to export the parameters into the device tree and
- generate a new device tree each time the FPGA bitstream changes. The
- parameters which used to be exported as #defines will now become
- properties of the device node. In general, device nodes for IP-cores
- will take the following form:
-
- (name): (generic-name)@(base-address) {
- compatible = "xlnx,(ip-core-name)-(HW_VER)"
- [, (list of compatible devices), ...];
- reg = <(baseaddr) (size)>;
- interrupt-parent = <&interrupt-controller-phandle>;
- interrupts = < ... >;
- xlnx,(parameter1) = "(string-value)";
- xlnx,(parameter2) = <(int-value)>;
- };
-
- (generic-name): an open firmware-style name that describes the
- generic class of device. Preferably, this is one word, such
- as 'serial' or 'ethernet'.
- (ip-core-name): the name of the ip block (given after the BEGIN
- directive in system.mhs). Should be in lowercase
- and all underscores '_' converted to dashes '-'.
- (name): is derived from the "PARAMETER INSTANCE" value.
- (parameter#): C_* parameters from system.mhs. The C_ prefix is
- dropped from the parameter name, the name is converted
- to lowercase and all underscore '_' characters are
- converted to dashes '-'.
- (baseaddr): the baseaddr parameter value (often named C_BASEADDR).
- (HW_VER): from the HW_VER parameter.
- (size): the address range size (often C_HIGHADDR - C_BASEADDR + 1).
-
- Typically, the compatible list will include the exact IP core version
- followed by an older IP core version which implements the same
- interface or any other device with the same interface.
-
- 'reg', 'interrupt-parent' and 'interrupts' are all optional properties.
-
- For example, the following block from system.mhs:
-
- BEGIN opb_uartlite
- PARAMETER INSTANCE = opb_uartlite_0
- PARAMETER HW_VER = 1.00.b
- PARAMETER C_BAUDRATE = 115200
- PARAMETER C_DATA_BITS = 8
- PARAMETER C_ODD_PARITY = 0
- PARAMETER C_USE_PARITY = 0
- PARAMETER C_CLK_FREQ = 50000000
- PARAMETER C_BASEADDR = 0xEC100000
- PARAMETER C_HIGHADDR = 0xEC10FFFF
- BUS_INTERFACE SOPB = opb_7
- PORT OPB_Clk = CLK_50MHz
- PORT Interrupt = opb_uartlite_0_Interrupt
- PORT RX = opb_uartlite_0_RX
- PORT TX = opb_uartlite_0_TX
- PORT OPB_Rst = sys_bus_reset_0
- END
-
- becomes the following device tree node:
-
- opb_uartlite_0: serial@ec100000 {
- device_type = "serial";
- compatible = "xlnx,opb-uartlite-1.00.b";
- reg = <ec100000 10000>;
- interrupt-parent = <&opb_intc_0>;
- interrupts = <1 0>; // got this from the opb_intc parameters
- current-speed = <d#115200>; // standard serial device prop
- clock-frequency = <d#50000000>; // standard serial device prop
- xlnx,data-bits = <8>;
- xlnx,odd-parity = <0>;
- xlnx,use-parity = <0>;
- };
-
- Some IP cores actually implement 2 or more logical devices. In
- this case, the device should still describe the whole IP core with
- a single node and add a child node for each logical device. The
- ranges property can be used to translate from parent IP-core to the
- registers of each device. In addition, the parent node should be
- compatible with the bus type 'xlnx,compound', and should contain
- #address-cells and #size-cells, as with any other bus. (Note: this
- makes the assumption that both logical devices have the same bus
- binding. If this is not true, then separate nodes should be used
- for each logical device). The 'cell-index' property can be used to
- enumerate logical devices within an IP core. For example, the
- following is the system.mhs entry for the dual ps2 controller found
- on the ml403 reference design.
-
- BEGIN opb_ps2_dual_ref
- PARAMETER INSTANCE = opb_ps2_dual_ref_0
- PARAMETER HW_VER = 1.00.a
- PARAMETER C_BASEADDR = 0xA9000000
- PARAMETER C_HIGHADDR = 0xA9001FFF
- BUS_INTERFACE SOPB = opb_v20_0
- PORT Sys_Intr1 = ps2_1_intr
- PORT Sys_Intr2 = ps2_2_intr
- PORT Clkin1 = ps2_clk_rx_1
- PORT Clkin2 = ps2_clk_rx_2
- PORT Clkpd1 = ps2_clk_tx_1
- PORT Clkpd2 = ps2_clk_tx_2
- PORT Rx1 = ps2_d_rx_1
- PORT Rx2 = ps2_d_rx_2
- PORT Txpd1 = ps2_d_tx_1
- PORT Txpd2 = ps2_d_tx_2
- END
-
- It would result in the following device tree nodes:
-
- opb_ps2_dual_ref_0: opb-ps2-dual-ref@a9000000 {
- #address-cells = <1>;
- #size-cells = <1>;
- compatible = "xlnx,compound";
- ranges = <0 a9000000 2000>;
- // If this device had extra parameters, then they would
- // go here.
- ps2@0 {
- compatible = "xlnx,opb-ps2-dual-ref-1.00.a";
- reg = <0 40>;
- interrupt-parent = <&opb_intc_0>;
- interrupts = <3 0>;
- cell-index = <0>;
- };
- ps2@1000 {
- compatible = "xlnx,opb-ps2-dual-ref-1.00.a";
- reg = <1000 40>;
- interrupt-parent = <&opb_intc_0>;
- interrupts = <3 0>;
- cell-index = <0>;
- };
- };
-
- Also, the system.mhs file defines bus attachments from the processor
- to the devices. The device tree structure should reflect the bus
- attachments. Again an example; this system.mhs fragment:
-
- BEGIN ppc405_virtex4
- PARAMETER INSTANCE = ppc405_0
- PARAMETER HW_VER = 1.01.a
- BUS_INTERFACE DPLB = plb_v34_0
- BUS_INTERFACE IPLB = plb_v34_0
- END
-
- BEGIN opb_intc
- PARAMETER INSTANCE = opb_intc_0
- PARAMETER HW_VER = 1.00.c
- PARAMETER C_BASEADDR = 0xD1000FC0
- PARAMETER C_HIGHADDR = 0xD1000FDF
- BUS_INTERFACE SOPB = opb_v20_0
- END
-
- BEGIN opb_uart16550
- PARAMETER INSTANCE = opb_uart16550_0
- PARAMETER HW_VER = 1.00.d
- PARAMETER C_BASEADDR = 0xa0000000
- PARAMETER C_HIGHADDR = 0xa0001FFF
- BUS_INTERFACE SOPB = opb_v20_0
- END
-
- BEGIN plb_v34
- PARAMETER INSTANCE = plb_v34_0
- PARAMETER HW_VER = 1.02.a
- END
-
- BEGIN plb_bram_if_cntlr
- PARAMETER INSTANCE = plb_bram_if_cntlr_0
- PARAMETER HW_VER = 1.00.b
- PARAMETER C_BASEADDR = 0xFFFF0000
- PARAMETER C_HIGHADDR = 0xFFFFFFFF
- BUS_INTERFACE SPLB = plb_v34_0
- END
-
- BEGIN plb2opb_bridge
- PARAMETER INSTANCE = plb2opb_bridge_0
- PARAMETER HW_VER = 1.01.a
- PARAMETER C_RNG0_BASEADDR = 0x20000000
- PARAMETER C_RNG0_HIGHADDR = 0x3FFFFFFF
- PARAMETER C_RNG1_BASEADDR = 0x60000000
- PARAMETER C_RNG1_HIGHADDR = 0x7FFFFFFF
- PARAMETER C_RNG2_BASEADDR = 0x80000000
- PARAMETER C_RNG2_HIGHADDR = 0xBFFFFFFF
- PARAMETER C_RNG3_BASEADDR = 0xC0000000
- PARAMETER C_RNG3_HIGHADDR = 0xDFFFFFFF
- BUS_INTERFACE SPLB = plb_v34_0
- BUS_INTERFACE MOPB = opb_v20_0
- END
-
- Gives this device tree (some properties removed for clarity):
-
- plb@0 {
- #address-cells = <1>;
- #size-cells = <1>;
- compatible = "xlnx,plb-v34-1.02.a";
- device_type = "ibm,plb";
- ranges; // 1:1 translation
-
- plb_bram_if_cntrl_0: bram@ffff0000 {
- reg = <ffff0000 10000>;
- }
-
- opb@20000000 {
- #address-cells = <1>;
- #size-cells = <1>;
- ranges = <20000000 20000000 20000000
- 60000000 60000000 20000000
- 80000000 80000000 40000000
- c0000000 c0000000 20000000>;
-
- opb_uart16550_0: serial@a0000000 {
- reg = <a00000000 2000>;
- };
-
- opb_intc_0: interrupt-controller@d1000fc0 {
- reg = <d1000fc0 20>;
- };
- };
- };
-
- That covers the general approach to binding xilinx IP cores into the
- device tree. The following are bindings for specific devices:
-
- i) Xilinx ML300 Framebuffer
-
- Simple framebuffer device from the ML300 reference design (also on the
- ML403 reference design as well as others).
-
- Optional properties:
- - resolution = <xres yres> : pixel resolution of framebuffer. Some
- implementations use a different resolution.
- Default is <d#640 d#480>
- - virt-resolution = <xvirt yvirt> : Size of framebuffer in memory.
- Default is <d#1024 d#480>.
- - rotate-display (empty) : rotate display 180 degrees.
-
- ii) Xilinx SystemACE
-
- The Xilinx SystemACE device is used to program FPGAs from an FPGA
- bitstream stored on a CF card. It can also be used as a generic CF
- interface device.
-
- Optional properties:
- - 8-bit (empty) : Set this property for SystemACE in 8 bit mode
-
- iii) Xilinx EMAC and Xilinx TEMAC
-
- Xilinx Ethernet devices. In addition to general xilinx properties
- listed above, nodes for these devices should include a phy-handle
- property, and may include other common network device properties
- like local-mac-address.
-
- iv) Xilinx Uartlite
-
- Xilinx uartlite devices are simple fixed speed serial ports.
-
- Required properties:
- - current-speed : Baud rate of uartlite
-
- v) Xilinx hwicap
-
- Xilinx hwicap devices provide access to the configuration logic
- of the FPGA through the Internal Configuration Access Port
- (ICAP). The ICAP enables partial reconfiguration of the FPGA,
- readback of the configuration information, and some control over
- 'warm boots' of the FPGA fabric.
-
- Required properties:
- - xlnx,family : The family of the FPGA, necessary since the
- capabilities of the underlying ICAP hardware
- differ between different families. May be
- 'virtex2p', 'virtex4', or 'virtex5'.
-
- vi) Xilinx Uart 16550
-
- Xilinx UART 16550 devices are very similar to the NS16550 but with
- different register spacing and an offset from the base address.
-
- Required properties:
- - clock-frequency : Frequency of the clock input
- - reg-offset : A value of 3 is required
- - reg-shift : A value of 2 is required
-
- e) USB EHCI controllers
-
- Required properties:
- - compatible : should be "usb-ehci".
- - reg : should contain at least address and length of the standard EHCI
- register set for the device. Optional platform-dependent registers
- (debug-port or other) can be also specified here, but only after
- definition of standard EHCI registers.
- - interrupts : one EHCI interrupt should be described here.
- If device registers are implemented in big endian mode, the device
- node should have "big-endian-regs" property.
- If controller implementation operates with big endian descriptors,
- "big-endian-desc" property should be specified.
- If both big endian registers and descriptors are used by the controller
- implementation, "big-endian" property can be specified instead of having
- both "big-endian-regs" and "big-endian-desc".
-
- Example (Sequoia 440EPx):
- ehci@e0000300 {
- compatible = "ibm,usb-ehci-440epx", "usb-ehci";
- interrupt-parent = <&UIC0>;
- interrupts = <1a 4>;
- reg = <0 e0000300 90 0 e0000390 70>;
- big-endian;
- };
-
- f) MDIO on GPIOs
-
- Currently defined compatibles:
- - virtual,gpio-mdio
-
- MDC and MDIO lines connected to GPIO controllers are listed in the
- gpios property as described in section VIII.1 in the following order:
-
- MDC, MDIO.
-
- Example:
-
- mdio {
- compatible = "virtual,mdio-gpio";
- #address-cells = <1>;
- #size-cells = <0>;
- gpios = <&qe_pio_a 11
- &qe_pio_c 6>;
- };
-
- g) SPI (Serial Peripheral Interface) busses
-
- SPI busses can be described with a node for the SPI master device
- and a set of child nodes for each SPI slave on the bus. For this
- discussion, it is assumed that the system's SPI controller is in
- SPI master mode. This binding does not describe SPI controllers
- in slave mode.
-
- The SPI master node requires the following properties:
- - #address-cells - number of cells required to define a chip select
- address on the SPI bus.
- - #size-cells - should be zero.
- - compatible - name of SPI bus controller following generic names
- recommended practice.
- No other properties are required in the SPI bus node. It is assumed
- that a driver for an SPI bus device will understand that it is an SPI bus.
- However, the binding does not attempt to define the specific method for
- assigning chip select numbers. Since SPI chip select configuration is
- flexible and non-standardized, it is left out of this binding with the
- assumption that board specific platform code will be used to manage
- chip selects. Individual drivers can define additional properties to
- support describing the chip select layout.
-
- SPI slave nodes must be children of the SPI master node and can
- contain the following properties.
- - reg - (required) chip select address of device.
- - compatible - (required) name of SPI device following generic names
- recommended practice
- - spi-max-frequency - (required) Maximum SPI clocking speed of device in Hz
- - spi-cpol - (optional) Empty property indicating device requires
- inverse clock polarity (CPOL) mode
- - spi-cpha - (optional) Empty property indicating device requires
- shifted clock phase (CPHA) mode
- - spi-cs-high - (optional) Empty property indicating device requires
- chip select active high
-
- SPI example for an MPC5200 SPI bus:
- spi@f00 {
- #address-cells = <1>;
- #size-cells = <0>;
- compatible = "fsl,mpc5200b-spi","fsl,mpc5200-spi";
- reg = <0xf00 0x20>;
- interrupts = <2 13 0 2 14 0>;
- interrupt-parent = <&mpc5200_pic>;
-
- ethernet-switch@0 {
- compatible = "micrel,ks8995m";
- spi-max-frequency = <1000000>;
- reg = <0>;
- };
-
- codec@1 {
- compatible = "ti,tlv320aic26";
- spi-max-frequency = <100000>;
- reg = <1>;
- };
- };
-
-VII - Marvell Discovery mv64[345]6x System Controller chips
-===========================================================
-
-The Marvell mv64[345]60 series of system controller chips contain
-many of the peripherals needed to implement a complete computer
-system. In this section, we define device tree nodes to describe
-the system controller chip itself and each of the peripherals
-which it contains. Compatible string values for each node are
-prefixed with the string "marvell,", for Marvell Technology Group Ltd.
-
-1) The /system-controller node
-
- This node is used to represent the system-controller and must be
- present when the system uses a system controller chip. The top-level
- system-controller node contains information that is global to all
- devices within the system controller chip. The node name begins
- with "system-controller" followed by the unit address, which is
- the base address of the memory-mapped register set for the system
- controller chip.
-
- Required properties:
-
- - ranges : Describes the translation of system controller addresses
- for memory mapped registers.
- - clock-frequency: Contains the main clock frequency for the system
- controller chip.
- - reg : This property defines the address and size of the
- memory-mapped registers contained within the system controller
- chip. The address specified in the "reg" property should match
- the unit address of the system-controller node.
- - #address-cells : Address representation for system controller
- devices. This field represents the number of cells needed to
- represent the address of the memory-mapped registers of devices
- within the system controller chip.
- - #size-cells : Size representation for for the memory-mapped
- registers within the system controller chip.
- - #interrupt-cells : Defines the width of cells used to represent
- interrupts.
-
- Optional properties:
-
- - model : The specific model of the system controller chip. Such
- as, "mv64360", "mv64460", or "mv64560".
- - compatible : A string identifying the compatibility identifiers
- of the system controller chip.
-
- The system-controller node contains child nodes for each system
- controller device that the platform uses. Nodes should not be created
- for devices which exist on the system controller chip but are not used
-
- Example Marvell Discovery mv64360 system-controller node:
-
- system-controller@f1000000 { /* Marvell Discovery mv64360 */
- #address-cells = <1>;
- #size-cells = <1>;
- model = "mv64360"; /* Default */
- compatible = "marvell,mv64360";
- clock-frequency = <133333333>;
- reg = <0xf1000000 0x10000>;
- virtual-reg = <0xf1000000>;
- ranges = <0x88000000 0x88000000 0x1000000 /* PCI 0 I/O Space */
- 0x80000000 0x80000000 0x8000000 /* PCI 0 MEM Space */
- 0xa0000000 0xa0000000 0x4000000 /* User FLASH */
- 0x00000000 0xf1000000 0x0010000 /* Bridge's regs */
- 0xf2000000 0xf2000000 0x0040000>;/* Integrated SRAM */
-
- [ child node definitions... ]
- }
-
-2) Child nodes of /system-controller
-
- a) Marvell Discovery MDIO bus
-
- The MDIO is a bus to which the PHY devices are connected. For each
- device that exists on this bus, a child node should be created. See
- the definition of the PHY node below for an example of how to define
- a PHY.
-
- Required properties:
- - #address-cells : Should be <1>
- - #size-cells : Should be <0>
- - device_type : Should be "mdio"
- - compatible : Should be "marvell,mv64360-mdio"
-
- Example:
-
- mdio {
- #address-cells = <1>;
- #size-cells = <0>;
- device_type = "mdio";
- compatible = "marvell,mv64360-mdio";
-
- ethernet-phy@0 {
- ......
- };
- };
-
-
- b) Marvell Discovery ethernet controller
-
- The Discover ethernet controller is described with two levels
- of nodes. The first level describes an ethernet silicon block
- and the second level describes up to 3 ethernet nodes within
- that block. The reason for the multiple levels is that the
- registers for the node are interleaved within a single set
- of registers. The "ethernet-block" level describes the
- shared register set, and the "ethernet" nodes describe ethernet
- port-specific properties.
-
- Ethernet block node
-
- Required properties:
- - #address-cells : <1>
- - #size-cells : <0>
- - compatible : "marvell,mv64360-eth-block"
- - reg : Offset and length of the register set for this block
-
- Example Discovery Ethernet block node:
- ethernet-block@2000 {
- #address-cells = <1>;
- #size-cells = <0>;
- compatible = "marvell,mv64360-eth-block";
- reg = <0x2000 0x2000>;
- ethernet@0 {
- .......
- };
- };
-
- Ethernet port node
-
- Required properties:
- - device_type : Should be "network".
- - compatible : Should be "marvell,mv64360-eth".
- - reg : Should be <0>, <1>, or <2>, according to which registers
- within the silicon block the device uses.
- - interrupts : <a> where a is the interrupt number for the port.
- - interrupt-parent : the phandle for the interrupt controller
- that services interrupts for this device.
- - phy : the phandle for the PHY connected to this ethernet
- controller.
- - local-mac-address : 6 bytes, MAC address
-
- Example Discovery Ethernet port node:
- ethernet@0 {
- device_type = "network";
- compatible = "marvell,mv64360-eth";
- reg = <0>;
- interrupts = <32>;
- interrupt-parent = <&PIC>;
- phy = <&PHY0>;
- local-mac-address = [ 00 00 00 00 00 00 ];
- };
-
-
-
- c) Marvell Discovery PHY nodes
-
- Required properties:
- - device_type : Should be "ethernet-phy"
- - interrupts : <a> where a is the interrupt number for this phy.
- - interrupt-parent : the phandle for the interrupt controller that
- services interrupts for this device.
- - reg : The ID number for the phy, usually a small integer
-
- Example Discovery PHY node:
- ethernet-phy@1 {
- device_type = "ethernet-phy";
- compatible = "broadcom,bcm5421";
- interrupts = <76>; /* GPP 12 */
- interrupt-parent = <&PIC>;
- reg = <1>;
- };
-
-
- d) Marvell Discovery SDMA nodes
-
- Represent DMA hardware associated with the MPSC (multiprotocol
- serial controllers).
-
- Required properties:
- - compatible : "marvell,mv64360-sdma"
- - reg : Offset and length of the register set for this device
- - interrupts : <a> where a is the interrupt number for the DMA
- device.
- - interrupt-parent : the phandle for the interrupt controller
- that services interrupts for this device.
-
- Example Discovery SDMA node:
- sdma@4000 {
- compatible = "marvell,mv64360-sdma";
- reg = <0x4000 0xc18>;
- virtual-reg = <0xf1004000>;
- interrupts = <36>;
- interrupt-parent = <&PIC>;
- };
-
-
- e) Marvell Discovery BRG nodes
-
- Represent baud rate generator hardware associated with the MPSC
- (multiprotocol serial controllers).
-
- Required properties:
- - compatible : "marvell,mv64360-brg"
- - reg : Offset and length of the register set for this device
- - clock-src : A value from 0 to 15 which selects the clock
- source for the baud rate generator. This value corresponds
- to the CLKS value in the BRGx configuration register. See
- the mv64x60 User's Manual.
- - clock-frequence : The frequency (in Hz) of the baud rate
- generator's input clock.
- - current-speed : The current speed setting (presumably by
- firmware) of the baud rate generator.
-
- Example Discovery BRG node:
- brg@b200 {
- compatible = "marvell,mv64360-brg";
- reg = <0xb200 0x8>;
- clock-src = <8>;
- clock-frequency = <133333333>;
- current-speed = <9600>;
- };
-
-
- f) Marvell Discovery CUNIT nodes
-
- Represent the Serial Communications Unit device hardware.
-
- Required properties:
- - reg : Offset and length of the register set for this device
-
- Example Discovery CUNIT node:
- cunit@f200 {
- reg = <0xf200 0x200>;
- };
-
-
- g) Marvell Discovery MPSCROUTING nodes
-
- Represent the Discovery's MPSC routing hardware
-
- Required properties:
- - reg : Offset and length of the register set for this device
-
- Example Discovery CUNIT node:
- mpscrouting@b500 {
- reg = <0xb400 0xc>;
- };
-
-
- h) Marvell Discovery MPSCINTR nodes
-
- Represent the Discovery's MPSC DMA interrupt hardware registers
- (SDMA cause and mask registers).
-
- Required properties:
- - reg : Offset and length of the register set for this device
-
- Example Discovery MPSCINTR node:
- mpsintr@b800 {
- reg = <0xb800 0x100>;
- };
-
-
- i) Marvell Discovery MPSC nodes
-
- Represent the Discovery's MPSC (Multiprotocol Serial Controller)
- serial port.
-
- Required properties:
- - device_type : "serial"
- - compatible : "marvell,mv64360-mpsc"
- - reg : Offset and length of the register set for this device
- - sdma : the phandle for the SDMA node used by this port
- - brg : the phandle for the BRG node used by this port
- - cunit : the phandle for the CUNIT node used by this port
- - mpscrouting : the phandle for the MPSCROUTING node used by this port
- - mpscintr : the phandle for the MPSCINTR node used by this port
- - cell-index : the hardware index of this cell in the MPSC core
- - max_idle : value needed for MPSC CHR3 (Maximum Frame Length)
- register
- - interrupts : <a> where a is the interrupt number for the MPSC.
- - interrupt-parent : the phandle for the interrupt controller
- that services interrupts for this device.
-
- Example Discovery MPSCINTR node:
- mpsc@8000 {
- device_type = "serial";
- compatible = "marvell,mv64360-mpsc";
- reg = <0x8000 0x38>;
- virtual-reg = <0xf1008000>;
- sdma = <&SDMA0>;
- brg = <&BRG0>;
- cunit = <&CUNIT>;
- mpscrouting = <&MPSCROUTING>;
- mpscintr = <&MPSCINTR>;
- cell-index = <0>;
- max_idle = <40>;
- interrupts = <40>;
- interrupt-parent = <&PIC>;
- };
-
-
- j) Marvell Discovery Watch Dog Timer nodes
-
- Represent the Discovery's watchdog timer hardware
-
- Required properties:
- - compatible : "marvell,mv64360-wdt"
- - reg : Offset and length of the register set for this device
-
- Example Discovery Watch Dog Timer node:
- wdt@b410 {
- compatible = "marvell,mv64360-wdt";
- reg = <0xb410 0x8>;
- };
-
-
- k) Marvell Discovery I2C nodes
-
- Represent the Discovery's I2C hardware
-
- Required properties:
- - device_type : "i2c"
- - compatible : "marvell,mv64360-i2c"
- - reg : Offset and length of the register set for this device
- - interrupts : <a> where a is the interrupt number for the I2C.
- - interrupt-parent : the phandle for the interrupt controller
- that services interrupts for this device.
-
- Example Discovery I2C node:
- compatible = "marvell,mv64360-i2c";
- reg = <0xc000 0x20>;
- virtual-reg = <0xf100c000>;
- interrupts = <37>;
- interrupt-parent = <&PIC>;
- };
-
-
- l) Marvell Discovery PIC (Programmable Interrupt Controller) nodes
-
- Represent the Discovery's PIC hardware
-
- Required properties:
- - #interrupt-cells : <1>
- - #address-cells : <0>
- - compatible : "marvell,mv64360-pic"
- - reg : Offset and length of the register set for this device
- - interrupt-controller
-
- Example Discovery PIC node:
- pic {
- #interrupt-cells = <1>;
- #address-cells = <0>;
- compatible = "marvell,mv64360-pic";
- reg = <0x0 0x88>;
- interrupt-controller;
- };
-
-
- m) Marvell Discovery MPP (Multipurpose Pins) multiplexing nodes
-
- Represent the Discovery's MPP hardware
-
- Required properties:
- - compatible : "marvell,mv64360-mpp"
- - reg : Offset and length of the register set for this device
-
- Example Discovery MPP node:
- mpp@f000 {
- compatible = "marvell,mv64360-mpp";
- reg = <0xf000 0x10>;
- };
-
-
- n) Marvell Discovery GPP (General Purpose Pins) nodes
-
- Represent the Discovery's GPP hardware
-
- Required properties:
- - compatible : "marvell,mv64360-gpp"
- - reg : Offset and length of the register set for this device
-
- Example Discovery GPP node:
- gpp@f000 {
- compatible = "marvell,mv64360-gpp";
- reg = <0xf100 0x20>;
- };
-
-
- o) Marvell Discovery PCI host bridge node
-
- Represents the Discovery's PCI host bridge device. The properties
- for this node conform to Rev 2.1 of the PCI Bus Binding to IEEE
- 1275-1994. A typical value for the compatible property is
- "marvell,mv64360-pci".
-
- Example Discovery PCI host bridge node
- pci@80000000 {
- #address-cells = <3>;
- #size-cells = <2>;
- #interrupt-cells = <1>;
- device_type = "pci";
- compatible = "marvell,mv64360-pci";
- reg = <0xcf8 0x8>;
- ranges = <0x01000000 0x0 0x0
- 0x88000000 0x0 0x01000000
- 0x02000000 0x0 0x80000000
- 0x80000000 0x0 0x08000000>;
- bus-range = <0 255>;
- clock-frequency = <66000000>;
- interrupt-parent = <&PIC>;
- interrupt-map-mask = <0xf800 0x0 0x0 0x7>;
- interrupt-map = <
- /* IDSEL 0x0a */
- 0x5000 0 0 1 &PIC 80
- 0x5000 0 0 2 &PIC 81
- 0x5000 0 0 3 &PIC 91
- 0x5000 0 0 4 &PIC 93
-
- /* IDSEL 0x0b */
- 0x5800 0 0 1 &PIC 91
- 0x5800 0 0 2 &PIC 93
- 0x5800 0 0 3 &PIC 80
- 0x5800 0 0 4 &PIC 81
-
- /* IDSEL 0x0c */
- 0x6000 0 0 1 &PIC 91
- 0x6000 0 0 2 &PIC 93
- 0x6000 0 0 3 &PIC 80
- 0x6000 0 0 4 &PIC 81
-
- /* IDSEL 0x0d */
- 0x6800 0 0 1 &PIC 93
- 0x6800 0 0 2 &PIC 80
- 0x6800 0 0 3 &PIC 81
- 0x6800 0 0 4 &PIC 91
- >;
- };
-
-
- p) Marvell Discovery CPU Error nodes
-
- Represent the Discovery's CPU error handler device.
-
- Required properties:
- - compatible : "marvell,mv64360-cpu-error"
- - reg : Offset and length of the register set for this device
- - interrupts : the interrupt number for this device
- - interrupt-parent : the phandle for the interrupt controller
- that services interrupts for this device.
-
- Example Discovery CPU Error node:
- cpu-error@0070 {
- compatible = "marvell,mv64360-cpu-error";
- reg = <0x70 0x10 0x128 0x28>;
- interrupts = <3>;
- interrupt-parent = <&PIC>;
- };
-
-
- q) Marvell Discovery SRAM Controller nodes
-
- Represent the Discovery's SRAM controller device.
-
- Required properties:
- - compatible : "marvell,mv64360-sram-ctrl"
- - reg : Offset and length of the register set for this device
- - interrupts : the interrupt number for this device
- - interrupt-parent : the phandle for the interrupt controller
- that services interrupts for this device.
-
- Example Discovery SRAM Controller node:
- sram-ctrl@0380 {
- compatible = "marvell,mv64360-sram-ctrl";
- reg = <0x380 0x80>;
- interrupts = <13>;
- interrupt-parent = <&PIC>;
- };
-
-
- r) Marvell Discovery PCI Error Handler nodes
-
- Represent the Discovery's PCI error handler device.
-
- Required properties:
- - compatible : "marvell,mv64360-pci-error"
- - reg : Offset and length of the register set for this device
- - interrupts : the interrupt number for this device
- - interrupt-parent : the phandle for the interrupt controller
- that services interrupts for this device.
-
- Example Discovery PCI Error Handler node:
- pci-error@1d40 {
- compatible = "marvell,mv64360-pci-error";
- reg = <0x1d40 0x40 0xc28 0x4>;
- interrupts = <12>;
- interrupt-parent = <&PIC>;
- };
-
-
- s) Marvell Discovery Memory Controller nodes
-
- Represent the Discovery's memory controller device.
-
- Required properties:
- - compatible : "marvell,mv64360-mem-ctrl"
- - reg : Offset and length of the register set for this device
- - interrupts : the interrupt number for this device
- - interrupt-parent : the phandle for the interrupt controller
- that services interrupts for this device.
-
- Example Discovery Memory Controller node:
- mem-ctrl@1400 {
- compatible = "marvell,mv64360-mem-ctrl";
- reg = <0x1400 0x60>;
- interrupts = <17>;
- interrupt-parent = <&PIC>;
- };
-
-
-VIII - Specifying interrupt information for devices
+VII - Specifying interrupt information for devices
===================================================
The device tree represents the busses and devices of a hardware
@@ -2439,56 +1324,7 @@ encodings listed below:
2 = high to low edge sensitive type enabled
3 = low to high edge sensitive type enabled
-IX - Specifying GPIO information for devices
-============================================
-
-1) gpios property
------------------
-
-Nodes that makes use of GPIOs should define them using `gpios' property,
-format of which is: <&gpio-controller1-phandle gpio1-specifier
- &gpio-controller2-phandle gpio2-specifier
- 0 /* holes are permitted, means no GPIO 3 */
- &gpio-controller4-phandle gpio4-specifier
- ...>;
-
-Note that gpio-specifier length is controller dependent.
-
-gpio-specifier may encode: bank, pin position inside the bank,
-whether pin is open-drain and whether pin is logically inverted.
-
-Example of the node using GPIOs:
-
- node {
- gpios = <&qe_pio_e 18 0>;
- };
-
-In this example gpio-specifier is "18 0" and encodes GPIO pin number,
-and empty GPIO flags as accepted by the "qe_pio_e" gpio-controller.
-
-2) gpio-controller nodes
-------------------------
-
-Every GPIO controller node must have #gpio-cells property defined,
-this information will be used to translate gpio-specifiers.
-
-Example of two SOC GPIO banks defined as gpio-controller nodes:
-
- qe_pio_a: gpio-controller@1400 {
- #gpio-cells = <2>;
- compatible = "fsl,qe-pario-bank-a", "fsl,qe-pario-bank";
- reg = <0x1400 0x18>;
- gpio-controller;
- };
-
- qe_pio_e: gpio-controller@1460 {
- #gpio-cells = <2>;
- compatible = "fsl,qe-pario-bank-e", "fsl,qe-pario-bank";
- reg = <0x1460 0x18>;
- gpio-controller;
- };
-
-X - Specifying Device Power Management Information (sleep property)
+VIII - Specifying Device Power Management Information (sleep property)
===================================================================
Devices on SOCs often have mechanisms for placing devices into low-power
diff --git a/Documentation/powerpc/dts-bindings/4xx/emac.txt b/Documentation/powerpc/dts-bindings/4xx/emac.txt
new file mode 100644
index 00000000000..2161334a7ca
--- /dev/null
+++ b/Documentation/powerpc/dts-bindings/4xx/emac.txt
@@ -0,0 +1,148 @@
+ 4xx/Axon EMAC ethernet nodes
+
+ The EMAC ethernet controller in IBM and AMCC 4xx chips, and also
+ the Axon bridge. To operate this needs to interact with a ths
+ special McMAL DMA controller, and sometimes an RGMII or ZMII
+ interface. In addition to the nodes and properties described
+ below, the node for the OPB bus on which the EMAC sits must have a
+ correct clock-frequency property.
+
+ i) The EMAC node itself
+
+ Required properties:
+ - device_type : "network"
+
+ - compatible : compatible list, contains 2 entries, first is
+ "ibm,emac-CHIP" where CHIP is the host ASIC (440gx,
+ 405gp, Axon) and second is either "ibm,emac" or
+ "ibm,emac4". For Axon, thus, we have: "ibm,emac-axon",
+ "ibm,emac4"
+ - interrupts : <interrupt mapping for EMAC IRQ and WOL IRQ>
+ - interrupt-parent : optional, if needed for interrupt mapping
+ - reg : <registers mapping>
+ - local-mac-address : 6 bytes, MAC address
+ - mal-device : phandle of the associated McMAL node
+ - mal-tx-channel : 1 cell, index of the tx channel on McMAL associated
+ with this EMAC
+ - mal-rx-channel : 1 cell, index of the rx channel on McMAL associated
+ with this EMAC
+ - cell-index : 1 cell, hardware index of the EMAC cell on a given
+ ASIC (typically 0x0 and 0x1 for EMAC0 and EMAC1 on
+ each Axon chip)
+ - max-frame-size : 1 cell, maximum frame size supported in bytes
+ - rx-fifo-size : 1 cell, Rx fifo size in bytes for 10 and 100 Mb/sec
+ operations.
+ For Axon, 2048
+ - tx-fifo-size : 1 cell, Tx fifo size in bytes for 10 and 100 Mb/sec
+ operations.
+ For Axon, 2048.
+ - fifo-entry-size : 1 cell, size of a fifo entry (used to calculate
+ thresholds).
+ For Axon, 0x00000010
+ - mal-burst-size : 1 cell, MAL burst size (used to calculate thresholds)
+ in bytes.
+ For Axon, 0x00000100 (I think ...)
+ - phy-mode : string, mode of operations of the PHY interface.
+ Supported values are: "mii", "rmii", "smii", "rgmii",
+ "tbi", "gmii", rtbi", "sgmii".
+ For Axon on CAB, it is "rgmii"
+ - mdio-device : 1 cell, required iff using shared MDIO registers
+ (440EP). phandle of the EMAC to use to drive the
+ MDIO lines for the PHY used by this EMAC.
+ - zmii-device : 1 cell, required iff connected to a ZMII. phandle of
+ the ZMII device node
+ - zmii-channel : 1 cell, required iff connected to a ZMII. Which ZMII
+ channel or 0xffffffff if ZMII is only used for MDIO.
+ - rgmii-device : 1 cell, required iff connected to an RGMII. phandle
+ of the RGMII device node.
+ For Axon: phandle of plb5/plb4/opb/rgmii
+ - rgmii-channel : 1 cell, required iff connected to an RGMII. Which
+ RGMII channel is used by this EMAC.
+ Fox Axon: present, whatever value is appropriate for each
+ EMAC, that is the content of the current (bogus) "phy-port"
+ property.
+
+ Optional properties:
+ - phy-address : 1 cell, optional, MDIO address of the PHY. If absent,
+ a search is performed.
+ - phy-map : 1 cell, optional, bitmap of addresses to probe the PHY
+ for, used if phy-address is absent. bit 0x00000001 is
+ MDIO address 0.
+ For Axon it can be absent, though my current driver
+ doesn't handle phy-address yet so for now, keep
+ 0x00ffffff in it.
+ - rx-fifo-size-gige : 1 cell, Rx fifo size in bytes for 1000 Mb/sec
+ operations (if absent the value is the same as
+ rx-fifo-size). For Axon, either absent or 2048.
+ - tx-fifo-size-gige : 1 cell, Tx fifo size in bytes for 1000 Mb/sec
+ operations (if absent the value is the same as
+ tx-fifo-size). For Axon, either absent or 2048.
+ - tah-device : 1 cell, optional. If connected to a TAH engine for
+ offload, phandle of the TAH device node.
+ - tah-channel : 1 cell, optional. If appropriate, channel used on the
+ TAH engine.
+
+ Example:
+
+ EMAC0: ethernet@40000800 {
+ device_type = "network";
+ compatible = "ibm,emac-440gp", "ibm,emac";
+ interrupt-parent = <&UIC1>;
+ interrupts = <1c 4 1d 4>;
+ reg = <40000800 70>;
+ local-mac-address = [00 04 AC E3 1B 1E];
+ mal-device = <&MAL0>;
+ mal-tx-channel = <0 1>;
+ mal-rx-channel = <0>;
+ cell-index = <0>;
+ max-frame-size = <5dc>;
+ rx-fifo-size = <1000>;
+ tx-fifo-size = <800>;
+ phy-mode = "rmii";
+ phy-map = <00000001>;
+ zmii-device = <&ZMII0>;
+ zmii-channel = <0>;
+ };
+
+ ii) McMAL node
+
+ Required properties:
+ - device_type : "dma-controller"
+ - compatible : compatible list, containing 2 entries, first is
+ "ibm,mcmal-CHIP" where CHIP is the host ASIC (like
+ emac) and the second is either "ibm,mcmal" or
+ "ibm,mcmal2".
+ For Axon, "ibm,mcmal-axon","ibm,mcmal2"
+ - interrupts : <interrupt mapping for the MAL interrupts sources:
+ 5 sources: tx_eob, rx_eob, serr, txde, rxde>.
+ For Axon: This is _different_ from the current
+ firmware. We use the "delayed" interrupts for txeob
+ and rxeob. Thus we end up with mapping those 5 MPIC
+ interrupts, all level positive sensitive: 10, 11, 32,
+ 33, 34 (in decimal)
+ - dcr-reg : < DCR registers range >
+ - dcr-parent : if needed for dcr-reg
+ - num-tx-chans : 1 cell, number of Tx channels
+ - num-rx-chans : 1 cell, number of Rx channels
+
+ iii) ZMII node
+
+ Required properties:
+ - compatible : compatible list, containing 2 entries, first is
+ "ibm,zmii-CHIP" where CHIP is the host ASIC (like
+ EMAC) and the second is "ibm,zmii".
+ For Axon, there is no ZMII node.
+ - reg : <registers mapping>
+
+ iv) RGMII node
+
+ Required properties:
+ - compatible : compatible list, containing 2 entries, first is
+ "ibm,rgmii-CHIP" where CHIP is the host ASIC (like
+ EMAC) and the second is "ibm,rgmii".
+ For Axon, "ibm,rgmii-axon","ibm,rgmii"
+ - reg : <registers mapping>
+ - revision : as provided by the RGMII new version register if
+ available.
+ For Axon: 0x0000012a
+
diff --git a/Documentation/powerpc/dts-bindings/can/sja1000.txt b/Documentation/powerpc/dts-bindings/can/sja1000.txt
new file mode 100644
index 00000000000..d6d209ded93
--- /dev/null
+++ b/Documentation/powerpc/dts-bindings/can/sja1000.txt
@@ -0,0 +1,53 @@
+Memory mapped SJA1000 CAN controller from NXP (formerly Philips)
+
+Required properties:
+
+- compatible : should be "nxp,sja1000".
+
+- reg : should specify the chip select, address offset and size required
+ to map the registers of the SJA1000. The size is usually 0x80.
+
+- interrupts: property with a value describing the interrupt source
+ (number and sensitivity) required for the SJA1000.
+
+Optional properties:
+
+- nxp,external-clock-frequency : Frequency of the external oscillator
+ clock in Hz. Note that the internal clock frequency used by the
+ SJA1000 is half of that value. If not specified, a default value
+ of 16000000 (16 MHz) is used.
+
+- nxp,tx-output-mode : operation mode of the TX output control logic:
+ <0x0> : bi-phase output mode
+ <0x1> : normal output mode (default)
+ <0x2> : test output mode
+ <0x3> : clock output mode
+
+- nxp,tx-output-config : TX output pin configuration:
+ <0x01> : TX0 invert
+ <0x02> : TX0 pull-down (default)
+ <0x04> : TX0 pull-up
+ <0x06> : TX0 push-pull
+ <0x08> : TX1 invert
+ <0x10> : TX1 pull-down
+ <0x20> : TX1 pull-up
+ <0x30> : TX1 push-pull
+
+- nxp,clock-out-frequency : clock frequency in Hz on the CLKOUT pin.
+ If not specified or if the specified value is 0, the CLKOUT pin
+ will be disabled.
+
+- nxp,no-comparator-bypass : Allows to disable the CAN input comperator.
+
+For futher information, please have a look to the SJA1000 data sheet.
+
+Examples:
+
+can@3,100 {
+ compatible = "nxp,sja1000";
+ reg = <3 0x100 0x80>;
+ interrupts = <2 0>;
+ interrupt-parent = <&mpic>;
+ nxp,external-clock-frequency = <16000000>;
+};
+
diff --git a/Documentation/powerpc/dts-bindings/ecm.txt b/Documentation/powerpc/dts-bindings/ecm.txt
new file mode 100644
index 00000000000..f514f29c67d
--- /dev/null
+++ b/Documentation/powerpc/dts-bindings/ecm.txt
@@ -0,0 +1,64 @@
+=====================================================================
+E500 LAW & Coherency Module Device Tree Binding
+Copyright (C) 2009 Freescale Semiconductor Inc.
+=====================================================================
+
+Local Access Window (LAW) Node
+
+The LAW node represents the region of CCSR space where local access
+windows are configured. For ECM based devices this is the first 4k
+of CCSR space that includes CCSRBAR, ALTCBAR, ALTCAR, BPTR, and some
+number of local access windows as specified by fsl,num-laws.
+
+PROPERTIES
+
+ - compatible
+ Usage: required
+ Value type: <string>
+ Definition: Must include "fsl,ecm-law"
+
+ - reg
+ Usage: required
+ Value type: <prop-encoded-array>
+ Definition: A standard property. The value specifies the
+ physical address offset and length of the CCSR space
+ registers.
+
+ - fsl,num-laws
+ Usage: required
+ Value type: <u32>
+ Definition: The value specifies the number of local access
+ windows for this device.
+
+=====================================================================
+
+E500 Coherency Module Node
+
+The E500 LAW node represents the region of CCSR space where ECM config
+and error reporting registers exist, this is the second 4k (0x1000)
+of CCSR space.
+
+PROPERTIES
+
+ - compatible
+ Usage: required
+ Value type: <string>
+ Definition: Must include "fsl,CHIP-ecm", "fsl,ecm" where
+ CHIP is the processor (mpc8572, mpc8544, etc.)
+
+ - reg
+ Usage: required
+ Value type: <prop-encoded-array>
+ Definition: A standard property. The value specifies the
+ physical address offset and length of the CCSR space
+ registers.
+
+ - interrupts
+ Usage: required
+ Value type: <prop-encoded-array>
+
+ - interrupt-parent
+ Usage: required
+ Value type: <phandle>
+
+=====================================================================
diff --git a/Documentation/powerpc/dts-bindings/fsl/board.txt b/Documentation/powerpc/dts-bindings/fsl/board.txt
index 6c974d28eeb..e8b5bc24d0a 100644
--- a/Documentation/powerpc/dts-bindings/fsl/board.txt
+++ b/Documentation/powerpc/dts-bindings/fsl/board.txt
@@ -38,7 +38,7 @@ Required properities:
- reg : Should contain the address and the length of the GPIO bank
register.
- #gpio-cells : Should be two. The first cell is the pin number and the
- second cell is used to specify optional paramters (currently unused).
+ second cell is used to specify optional parameters (currently unused).
- gpio-controller : Marks the port as GPIO controller.
Example:
diff --git a/Documentation/powerpc/dts-bindings/fsl/cpm_qe/cpm.txt b/Documentation/powerpc/dts-bindings/fsl/cpm_qe/cpm.txt
index 088fc471e03..160c752484b 100644
--- a/Documentation/powerpc/dts-bindings/fsl/cpm_qe/cpm.txt
+++ b/Documentation/powerpc/dts-bindings/fsl/cpm_qe/cpm.txt
@@ -19,7 +19,7 @@ Example:
reg = <119c0 30>;
}
-* Properties common to mulitple CPM/QE devices
+* Properties common to multiple CPM/QE devices
- fsl,cpm-command : This value is ORed with the opcode and command flag
to specify the device on which a CPM command operates.
diff --git a/Documentation/powerpc/dts-bindings/fsl/cpm_qe/gpio.txt b/Documentation/powerpc/dts-bindings/fsl/cpm_qe/gpio.txt
index 1815dfede1b..349f79fd707 100644
--- a/Documentation/powerpc/dts-bindings/fsl/cpm_qe/gpio.txt
+++ b/Documentation/powerpc/dts-bindings/fsl/cpm_qe/gpio.txt
@@ -11,7 +11,7 @@ Required properties:
"fsl,cpm1-pario-bank-c", "fsl,cpm1-pario-bank-d",
"fsl,cpm1-pario-bank-e", "fsl,cpm2-pario-bank"
- #gpio-cells : Should be two. The first cell is the pin number and the
- second cell is used to specify optional paramters (currently unused).
+ second cell is used to specify optional parameters (currently unused).
- gpio-controller : Marks the port as GPIO controller.
Example of three SOC GPIO banks defined as gpio-controller nodes:
diff --git a/Documentation/powerpc/dts-bindings/fsl/cpm_qe/qe.txt b/Documentation/powerpc/dts-bindings/fsl/cpm_qe/qe.txt
index 78790d58dc2..6e37be1eeb2 100644
--- a/Documentation/powerpc/dts-bindings/fsl/cpm_qe/qe.txt
+++ b/Documentation/powerpc/dts-bindings/fsl/cpm_qe/qe.txt
@@ -17,6 +17,9 @@ Required properties:
- model : precise model of the QE, Can be "QE", "CPM", or "CPM2"
- reg : offset and length of the device registers.
- bus-frequency : the clock frequency for QUICC Engine.
+- fsl,qe-num-riscs: define how many RISC engines the QE has.
+- fsl,qe-num-snums: define how many serial number(SNUM) the QE can use for the
+ threads.
Recommended properties
- brg-frequency : the internal clock source frequency for baud-rate
diff --git a/Documentation/powerpc/dts-bindings/fsl/esdhc.txt b/Documentation/powerpc/dts-bindings/fsl/esdhc.txt
index 60084655776..3ed3797b508 100644
--- a/Documentation/powerpc/dts-bindings/fsl/esdhc.txt
+++ b/Documentation/powerpc/dts-bindings/fsl/esdhc.txt
@@ -5,17 +5,18 @@ for MMC, SD, and SDIO types of memory cards.
Required properties:
- compatible : should be
- "fsl,<chip>-esdhc", "fsl,mpc8379-esdhc" for MPC83xx processors.
- "fsl,<chip>-esdhc", "fsl,mpc8536-esdhc" for MPC85xx processors.
+ "fsl,<chip>-esdhc", "fsl,esdhc"
- reg : should contain eSDHC registers location and length.
- interrupts : should contain eSDHC interrupt.
- interrupt-parent : interrupt source phandle.
- clock-frequency : specifies eSDHC base clock frequency.
+ - sdhci,1-bit-only : (optional) specifies that a controller can
+ only handle 1-bit data transfers.
Example:
sdhci@2e000 {
- compatible = "fsl,mpc8378-esdhc", "fsl,mpc8379-esdhc";
+ compatible = "fsl,mpc8378-esdhc", "fsl,esdhc";
reg = <0x2e000 0x1000>;
interrupts = <42 0x8>;
interrupt-parent = <&ipic>;
diff --git a/Documentation/powerpc/dts-bindings/fsl/mcm.txt b/Documentation/powerpc/dts-bindings/fsl/mcm.txt
new file mode 100644
index 00000000000..4ceda9b3b41
--- /dev/null
+++ b/Documentation/powerpc/dts-bindings/fsl/mcm.txt
@@ -0,0 +1,64 @@
+=====================================================================
+MPX LAW & Coherency Module Device Tree Binding
+Copyright (C) 2009 Freescale Semiconductor Inc.
+=====================================================================
+
+Local Access Window (LAW) Node
+
+The LAW node represents the region of CCSR space where local access
+windows are configured. For MCM based devices this is the first 4k
+of CCSR space that includes CCSRBAR, ALTCBAR, ALTCAR, BPTR, and some
+number of local access windows as specified by fsl,num-laws.
+
+PROPERTIES
+
+ - compatible
+ Usage: required
+ Value type: <string>
+ Definition: Must include "fsl,mcm-law"
+
+ - reg
+ Usage: required
+ Value type: <prop-encoded-array>
+ Definition: A standard property. The value specifies the
+ physical address offset and length of the CCSR space
+ registers.
+
+ - fsl,num-laws
+ Usage: required
+ Value type: <u32>
+ Definition: The value specifies the number of local access
+ windows for this device.
+
+=====================================================================
+
+MPX Coherency Module Node
+
+The MPX LAW node represents the region of CCSR space where MCM config
+and error reporting registers exist, this is the second 4k (0x1000)
+of CCSR space.
+
+PROPERTIES
+
+ - compatible
+ Usage: required
+ Value type: <string>
+ Definition: Must include "fsl,CHIP-mcm", "fsl,mcm" where
+ CHIP is the processor (mpc8641, mpc8610, etc.)
+
+ - reg
+ Usage: required
+ Value type: <prop-encoded-array>
+ Definition: A standard property. The value specifies the
+ physical address offset and length of the CCSR space
+ registers.
+
+ - interrupts
+ Usage: required
+ Value type: <prop-encoded-array>
+
+ - interrupt-parent
+ Usage: required
+ Value type: <phandle>
+
+=====================================================================
diff --git a/Documentation/powerpc/dts-bindings/fsl/msi-pic.txt b/Documentation/powerpc/dts-bindings/fsl/msi-pic.txt
index b26b91992c5..bcc30bac683 100644
--- a/Documentation/powerpc/dts-bindings/fsl/msi-pic.txt
+++ b/Documentation/powerpc/dts-bindings/fsl/msi-pic.txt
@@ -1,6 +1,6 @@
* Freescale MSI interrupt controller
-Reguired properities:
+Required properties:
- compatible : compatible list, contains 2 entries,
first is "fsl,CHIP-msi", where CHIP is the processor(mpc8610, mpc8572,
etc.) and the second is "fsl,mpic-msi" or "fsl,ipic-msi" depending on
diff --git a/Documentation/powerpc/dts-bindings/fsl/pmc.txt b/Documentation/powerpc/dts-bindings/fsl/pmc.txt
index 02f6f43ee1b..07256b7ffca 100644
--- a/Documentation/powerpc/dts-bindings/fsl/pmc.txt
+++ b/Documentation/powerpc/dts-bindings/fsl/pmc.txt
@@ -15,8 +15,8 @@ Properties:
compatible; all statements below that apply to "fsl,mpc8548-pmc" also
apply to "fsl,mpc8641d-pmc".
- Compatibility does not include bit assigments in SCCR/PMCDR/DEVDISR; these
- bit assigments are indicated via the sleep specifier in each device's
+ Compatibility does not include bit assignments in SCCR/PMCDR/DEVDISR; these
+ bit assignments are indicated via the sleep specifier in each device's
sleep property.
- reg: For devices compatible with "fsl,mpc8349-pmc", the first resource
diff --git a/Documentation/powerpc/dts-bindings/gpio/gpio.txt b/Documentation/powerpc/dts-bindings/gpio/gpio.txt
new file mode 100644
index 00000000000..edaa84d288a
--- /dev/null
+++ b/Documentation/powerpc/dts-bindings/gpio/gpio.txt
@@ -0,0 +1,50 @@
+Specifying GPIO information for devices
+============================================
+
+1) gpios property
+-----------------
+
+Nodes that makes use of GPIOs should define them using `gpios' property,
+format of which is: <&gpio-controller1-phandle gpio1-specifier
+ &gpio-controller2-phandle gpio2-specifier
+ 0 /* holes are permitted, means no GPIO 3 */
+ &gpio-controller4-phandle gpio4-specifier
+ ...>;
+
+Note that gpio-specifier length is controller dependent.
+
+gpio-specifier may encode: bank, pin position inside the bank,
+whether pin is open-drain and whether pin is logically inverted.
+
+Example of the node using GPIOs:
+
+ node {
+ gpios = <&qe_pio_e 18 0>;
+ };
+
+In this example gpio-specifier is "18 0" and encodes GPIO pin number,
+and empty GPIO flags as accepted by the "qe_pio_e" gpio-controller.
+
+2) gpio-controller nodes
+------------------------
+
+Every GPIO controller node must have #gpio-cells property defined,
+this information will be used to translate gpio-specifiers.
+
+Example of two SOC GPIO banks defined as gpio-controller nodes:
+
+ qe_pio_a: gpio-controller@1400 {
+ #gpio-cells = <2>;
+ compatible = "fsl,qe-pario-bank-a", "fsl,qe-pario-bank";
+ reg = <0x1400 0x18>;
+ gpio-controller;
+ };
+
+ qe_pio_e: gpio-controller@1460 {
+ #gpio-cells = <2>;
+ compatible = "fsl,qe-pario-bank-e", "fsl,qe-pario-bank";
+ reg = <0x1460 0x18>;
+ gpio-controller;
+ };
+
+
diff --git a/Documentation/powerpc/dts-bindings/gpio/led.txt b/Documentation/powerpc/dts-bindings/gpio/led.txt
index 4fe14deedc0..064db928c3c 100644
--- a/Documentation/powerpc/dts-bindings/gpio/led.txt
+++ b/Documentation/powerpc/dts-bindings/gpio/led.txt
@@ -16,10 +16,17 @@ LED sub-node properties:
string defining the trigger assigned to the LED. Current triggers are:
"backlight" - LED will act as a back-light, controlled by the framebuffer
system
- "default-on" - LED will turn on
+ "default-on" - LED will turn on, but see "default-state" below
"heartbeat" - LED "double" flashes at a load average based rate
"ide-disk" - LED indicates disk activity
"timer" - LED flashes at a fixed, configurable rate
+- default-state: (optional) The initial state of the LED. Valid
+ values are "on", "off", and "keep". If the LED is already on or off
+ and the default-state property is set the to same value, then no
+ glitch should be produced where the LED momentarily turns off (or
+ on). The "keep" setting will keep the LED at whatever its current
+ state is, without producing a glitch. The default is off if this
+ property is not present.
Examples:
@@ -30,14 +37,22 @@ leds {
gpios = <&mcu_pio 0 1>; /* Active low */
linux,default-trigger = "ide-disk";
};
+
+ fault {
+ gpios = <&mcu_pio 1 0>;
+ /* Keep LED on if BIOS detected hardware fault */
+ default-state = "keep";
+ };
};
run-control {
compatible = "gpio-leds";
red {
gpios = <&mpc8572 6 0>;
+ default-state = "off";
};
green {
gpios = <&mpc8572 7 0>;
+ default-state = "on";
};
}
diff --git a/Documentation/powerpc/dts-bindings/gpio/mdio.txt b/Documentation/powerpc/dts-bindings/gpio/mdio.txt
new file mode 100644
index 00000000000..bc954952901
--- /dev/null
+++ b/Documentation/powerpc/dts-bindings/gpio/mdio.txt
@@ -0,0 +1,19 @@
+MDIO on GPIOs
+
+Currently defined compatibles:
+- virtual,gpio-mdio
+
+MDC and MDIO lines connected to GPIO controllers are listed in the
+gpios property as described in section VIII.1 in the following order:
+
+MDC, MDIO.
+
+Example:
+
+mdio {
+ compatible = "virtual,mdio-gpio";
+ #address-cells = <1>;
+ #size-cells = <0>;
+ gpios = <&qe_pio_a 11
+ &qe_pio_c 6>;
+};
diff --git a/Documentation/powerpc/dts-bindings/marvell.txt b/Documentation/powerpc/dts-bindings/marvell.txt
new file mode 100644
index 00000000000..3708a2fd474
--- /dev/null
+++ b/Documentation/powerpc/dts-bindings/marvell.txt
@@ -0,0 +1,521 @@
+Marvell Discovery mv64[345]6x System Controller chips
+===========================================================
+
+The Marvell mv64[345]60 series of system controller chips contain
+many of the peripherals needed to implement a complete computer
+system. In this section, we define device tree nodes to describe
+the system controller chip itself and each of the peripherals
+which it contains. Compatible string values for each node are
+prefixed with the string "marvell,", for Marvell Technology Group Ltd.
+
+1) The /system-controller node
+
+ This node is used to represent the system-controller and must be
+ present when the system uses a system controller chip. The top-level
+ system-controller node contains information that is global to all
+ devices within the system controller chip. The node name begins
+ with "system-controller" followed by the unit address, which is
+ the base address of the memory-mapped register set for the system
+ controller chip.
+
+ Required properties:
+
+ - ranges : Describes the translation of system controller addresses
+ for memory mapped registers.
+ - clock-frequency: Contains the main clock frequency for the system
+ controller chip.
+ - reg : This property defines the address and size of the
+ memory-mapped registers contained within the system controller
+ chip. The address specified in the "reg" property should match
+ the unit address of the system-controller node.
+ - #address-cells : Address representation for system controller
+ devices. This field represents the number of cells needed to
+ represent the address of the memory-mapped registers of devices
+ within the system controller chip.
+ - #size-cells : Size representation for for the memory-mapped
+ registers within the system controller chip.
+ - #interrupt-cells : Defines the width of cells used to represent
+ interrupts.
+
+ Optional properties:
+
+ - model : The specific model of the system controller chip. Such
+ as, "mv64360", "mv64460", or "mv64560".
+ - compatible : A string identifying the compatibility identifiers
+ of the system controller chip.
+
+ The system-controller node contains child nodes for each system
+ controller device that the platform uses. Nodes should not be created
+ for devices which exist on the system controller chip but are not used
+
+ Example Marvell Discovery mv64360 system-controller node:
+
+ system-controller@f1000000 { /* Marvell Discovery mv64360 */
+ #address-cells = <1>;
+ #size-cells = <1>;
+ model = "mv64360"; /* Default */
+ compatible = "marvell,mv64360";
+ clock-frequency = <133333333>;
+ reg = <0xf1000000 0x10000>;
+ virtual-reg = <0xf1000000>;
+ ranges = <0x88000000 0x88000000 0x1000000 /* PCI 0 I/O Space */
+ 0x80000000 0x80000000 0x8000000 /* PCI 0 MEM Space */
+ 0xa0000000 0xa0000000 0x4000000 /* User FLASH */
+ 0x00000000 0xf1000000 0x0010000 /* Bridge's regs */
+ 0xf2000000 0xf2000000 0x0040000>;/* Integrated SRAM */
+
+ [ child node definitions... ]
+ }
+
+2) Child nodes of /system-controller
+
+ a) Marvell Discovery MDIO bus
+
+ The MDIO is a bus to which the PHY devices are connected. For each
+ device that exists on this bus, a child node should be created. See
+ the definition of the PHY node below for an example of how to define
+ a PHY.
+
+ Required properties:
+ - #address-cells : Should be <1>
+ - #size-cells : Should be <0>
+ - device_type : Should be "mdio"
+ - compatible : Should be "marvell,mv64360-mdio"
+
+ Example:
+
+ mdio {
+ #address-cells = <1>;
+ #size-cells = <0>;
+ device_type = "mdio";
+ compatible = "marvell,mv64360-mdio";
+
+ ethernet-phy@0 {
+ ......
+ };
+ };
+
+
+ b) Marvell Discovery ethernet controller
+
+ The Discover ethernet controller is described with two levels
+ of nodes. The first level describes an ethernet silicon block
+ and the second level describes up to 3 ethernet nodes within
+ that block. The reason for the multiple levels is that the
+ registers for the node are interleaved within a single set
+ of registers. The "ethernet-block" level describes the
+ shared register set, and the "ethernet" nodes describe ethernet
+ port-specific properties.
+
+ Ethernet block node
+
+ Required properties:
+ - #address-cells : <1>
+ - #size-cells : <0>
+ - compatible : "marvell,mv64360-eth-block"
+ - reg : Offset and length of the register set for this block
+
+ Example Discovery Ethernet block node:
+ ethernet-block@2000 {
+ #address-cells = <1>;
+ #size-cells = <0>;
+ compatible = "marvell,mv64360-eth-block";
+ reg = <0x2000 0x2000>;
+ ethernet@0 {
+ .......
+ };
+ };
+
+ Ethernet port node
+
+ Required properties:
+ - device_type : Should be "network".
+ - compatible : Should be "marvell,mv64360-eth".
+ - reg : Should be <0>, <1>, or <2>, according to which registers
+ within the silicon block the device uses.
+ - interrupts : <a> where a is the interrupt number for the port.
+ - interrupt-parent : the phandle for the interrupt controller
+ that services interrupts for this device.
+ - phy : the phandle for the PHY connected to this ethernet
+ controller.
+ - local-mac-address : 6 bytes, MAC address
+
+ Example Discovery Ethernet port node:
+ ethernet@0 {
+ device_type = "network";
+ compatible = "marvell,mv64360-eth";
+ reg = <0>;
+ interrupts = <32>;
+ interrupt-parent = <&PIC>;
+ phy = <&PHY0>;
+ local-mac-address = [ 00 00 00 00 00 00 ];
+ };
+
+
+
+ c) Marvell Discovery PHY nodes
+
+ Required properties:
+ - device_type : Should be "ethernet-phy"
+ - interrupts : <a> where a is the interrupt number for this phy.
+ - interrupt-parent : the phandle for the interrupt controller that
+ services interrupts for this device.
+ - reg : The ID number for the phy, usually a small integer
+
+ Example Discovery PHY node:
+ ethernet-phy@1 {
+ device_type = "ethernet-phy";
+ compatible = "broadcom,bcm5421";
+ interrupts = <76>; /* GPP 12 */
+ interrupt-parent = <&PIC>;
+ reg = <1>;
+ };
+
+
+ d) Marvell Discovery SDMA nodes
+
+ Represent DMA hardware associated with the MPSC (multiprotocol
+ serial controllers).
+
+ Required properties:
+ - compatible : "marvell,mv64360-sdma"
+ - reg : Offset and length of the register set for this device
+ - interrupts : <a> where a is the interrupt number for the DMA
+ device.
+ - interrupt-parent : the phandle for the interrupt controller
+ that services interrupts for this device.
+
+ Example Discovery SDMA node:
+ sdma@4000 {
+ compatible = "marvell,mv64360-sdma";
+ reg = <0x4000 0xc18>;
+ virtual-reg = <0xf1004000>;
+ interrupts = <36>;
+ interrupt-parent = <&PIC>;
+ };
+
+
+ e) Marvell Discovery BRG nodes
+
+ Represent baud rate generator hardware associated with the MPSC
+ (multiprotocol serial controllers).
+
+ Required properties:
+ - compatible : "marvell,mv64360-brg"
+ - reg : Offset and length of the register set for this device
+ - clock-src : A value from 0 to 15 which selects the clock
+ source for the baud rate generator. This value corresponds
+ to the CLKS value in the BRGx configuration register. See
+ the mv64x60 User's Manual.
+ - clock-frequence : The frequency (in Hz) of the baud rate
+ generator's input clock.
+ - current-speed : The current speed setting (presumably by
+ firmware) of the baud rate generator.
+
+ Example Discovery BRG node:
+ brg@b200 {
+ compatible = "marvell,mv64360-brg";
+ reg = <0xb200 0x8>;
+ clock-src = <8>;
+ clock-frequency = <133333333>;
+ current-speed = <9600>;
+ };
+
+
+ f) Marvell Discovery CUNIT nodes
+
+ Represent the Serial Communications Unit device hardware.
+
+ Required properties:
+ - reg : Offset and length of the register set for this device
+
+ Example Discovery CUNIT node:
+ cunit@f200 {
+ reg = <0xf200 0x200>;
+ };
+
+
+ g) Marvell Discovery MPSCROUTING nodes
+
+ Represent the Discovery's MPSC routing hardware
+
+ Required properties:
+ - reg : Offset and length of the register set for this device
+
+ Example Discovery CUNIT node:
+ mpscrouting@b500 {
+ reg = <0xb400 0xc>;
+ };
+
+
+ h) Marvell Discovery MPSCINTR nodes
+
+ Represent the Discovery's MPSC DMA interrupt hardware registers
+ (SDMA cause and mask registers).
+
+ Required properties:
+ - reg : Offset and length of the register set for this device
+
+ Example Discovery MPSCINTR node:
+ mpsintr@b800 {
+ reg = <0xb800 0x100>;
+ };
+
+
+ i) Marvell Discovery MPSC nodes
+
+ Represent the Discovery's MPSC (Multiprotocol Serial Controller)
+ serial port.
+
+ Required properties:
+ - device_type : "serial"
+ - compatible : "marvell,mv64360-mpsc"
+ - reg : Offset and length of the register set for this device
+ - sdma : the phandle for the SDMA node used by this port
+ - brg : the phandle for the BRG node used by this port
+ - cunit : the phandle for the CUNIT node used by this port
+ - mpscrouting : the phandle for the MPSCROUTING node used by this port
+ - mpscintr : the phandle for the MPSCINTR node used by this port
+ - cell-index : the hardware index of this cell in the MPSC core
+ - max_idle : value needed for MPSC CHR3 (Maximum Frame Length)
+ register
+ - interrupts : <a> where a is the interrupt number for the MPSC.
+ - interrupt-parent : the phandle for the interrupt controller
+ that services interrupts for this device.
+
+ Example Discovery MPSCINTR node:
+ mpsc@8000 {
+ device_type = "serial";
+ compatible = "marvell,mv64360-mpsc";
+ reg = <0x8000 0x38>;
+ virtual-reg = <0xf1008000>;
+ sdma = <&SDMA0>;
+ brg = <&BRG0>;
+ cunit = <&CUNIT>;
+ mpscrouting = <&MPSCROUTING>;
+ mpscintr = <&MPSCINTR>;
+ cell-index = <0>;
+ max_idle = <40>;
+ interrupts = <40>;
+ interrupt-parent = <&PIC>;
+ };
+
+
+ j) Marvell Discovery Watch Dog Timer nodes
+
+ Represent the Discovery's watchdog timer hardware
+
+ Required properties:
+ - compatible : "marvell,mv64360-wdt"
+ - reg : Offset and length of the register set for this device
+
+ Example Discovery Watch Dog Timer node:
+ wdt@b410 {
+ compatible = "marvell,mv64360-wdt";
+ reg = <0xb410 0x8>;
+ };
+
+
+ k) Marvell Discovery I2C nodes
+
+ Represent the Discovery's I2C hardware
+
+ Required properties:
+ - device_type : "i2c"
+ - compatible : "marvell,mv64360-i2c"
+ - reg : Offset and length of the register set for this device
+ - interrupts : <a> where a is the interrupt number for the I2C.
+ - interrupt-parent : the phandle for the interrupt controller
+ that services interrupts for this device.
+
+ Example Discovery I2C node:
+ compatible = "marvell,mv64360-i2c";
+ reg = <0xc000 0x20>;
+ virtual-reg = <0xf100c000>;
+ interrupts = <37>;
+ interrupt-parent = <&PIC>;
+ };
+
+
+ l) Marvell Discovery PIC (Programmable Interrupt Controller) nodes
+
+ Represent the Discovery's PIC hardware
+
+ Required properties:
+ - #interrupt-cells : <1>
+ - #address-cells : <0>
+ - compatible : "marvell,mv64360-pic"
+ - reg : Offset and length of the register set for this device
+ - interrupt-controller
+
+ Example Discovery PIC node:
+ pic {
+ #interrupt-cells = <1>;
+ #address-cells = <0>;
+ compatible = "marvell,mv64360-pic";
+ reg = <0x0 0x88>;
+ interrupt-controller;
+ };
+
+
+ m) Marvell Discovery MPP (Multipurpose Pins) multiplexing nodes
+
+ Represent the Discovery's MPP hardware
+
+ Required properties:
+ - compatible : "marvell,mv64360-mpp"
+ - reg : Offset and length of the register set for this device
+
+ Example Discovery MPP node:
+ mpp@f000 {
+ compatible = "marvell,mv64360-mpp";
+ reg = <0xf000 0x10>;
+ };
+
+
+ n) Marvell Discovery GPP (General Purpose Pins) nodes
+
+ Represent the Discovery's GPP hardware
+
+ Required properties:
+ - compatible : "marvell,mv64360-gpp"
+ - reg : Offset and length of the register set for this device
+
+ Example Discovery GPP node:
+ gpp@f000 {
+ compatible = "marvell,mv64360-gpp";
+ reg = <0xf100 0x20>;
+ };
+
+
+ o) Marvell Discovery PCI host bridge node
+
+ Represents the Discovery's PCI host bridge device. The properties
+ for this node conform to Rev 2.1 of the PCI Bus Binding to IEEE
+ 1275-1994. A typical value for the compatible property is
+ "marvell,mv64360-pci".
+
+ Example Discovery PCI host bridge node
+ pci@80000000 {
+ #address-cells = <3>;
+ #size-cells = <2>;
+ #interrupt-cells = <1>;
+ device_type = "pci";
+ compatible = "marvell,mv64360-pci";
+ reg = <0xcf8 0x8>;
+ ranges = <0x01000000 0x0 0x0
+ 0x88000000 0x0 0x01000000
+ 0x02000000 0x0 0x80000000
+ 0x80000000 0x0 0x08000000>;
+ bus-range = <0 255>;
+ clock-frequency = <66000000>;
+ interrupt-parent = <&PIC>;
+ interrupt-map-mask = <0xf800 0x0 0x0 0x7>;
+ interrupt-map = <
+ /* IDSEL 0x0a */
+ 0x5000 0 0 1 &PIC 80
+ 0x5000 0 0 2 &PIC 81
+ 0x5000 0 0 3 &PIC 91
+ 0x5000 0 0 4 &PIC 93
+
+ /* IDSEL 0x0b */
+ 0x5800 0 0 1 &PIC 91
+ 0x5800 0 0 2 &PIC 93
+ 0x5800 0 0 3 &PIC 80
+ 0x5800 0 0 4 &PIC 81
+
+ /* IDSEL 0x0c */
+ 0x6000 0 0 1 &PIC 91
+ 0x6000 0 0 2 &PIC 93
+ 0x6000 0 0 3 &PIC 80
+ 0x6000 0 0 4 &PIC 81
+
+ /* IDSEL 0x0d */
+ 0x6800 0 0 1 &PIC 93
+ 0x6800 0 0 2 &PIC 80
+ 0x6800 0 0 3 &PIC 81
+ 0x6800 0 0 4 &PIC 91
+ >;
+ };
+
+
+ p) Marvell Discovery CPU Error nodes
+
+ Represent the Discovery's CPU error handler device.
+
+ Required properties:
+ - compatible : "marvell,mv64360-cpu-error"
+ - reg : Offset and length of the register set for this device
+ - interrupts : the interrupt number for this device
+ - interrupt-parent : the phandle for the interrupt controller
+ that services interrupts for this device.
+
+ Example Discovery CPU Error node:
+ cpu-error@0070 {
+ compatible = "marvell,mv64360-cpu-error";
+ reg = <0x70 0x10 0x128 0x28>;
+ interrupts = <3>;
+ interrupt-parent = <&PIC>;
+ };
+
+
+ q) Marvell Discovery SRAM Controller nodes
+
+ Represent the Discovery's SRAM controller device.
+
+ Required properties:
+ - compatible : "marvell,mv64360-sram-ctrl"
+ - reg : Offset and length of the register set for this device
+ - interrupts : the interrupt number for this device
+ - interrupt-parent : the phandle for the interrupt controller
+ that services interrupts for this device.
+
+ Example Discovery SRAM Controller node:
+ sram-ctrl@0380 {
+ compatible = "marvell,mv64360-sram-ctrl";
+ reg = <0x380 0x80>;
+ interrupts = <13>;
+ interrupt-parent = <&PIC>;
+ };
+
+
+ r) Marvell Discovery PCI Error Handler nodes
+
+ Represent the Discovery's PCI error handler device.
+
+ Required properties:
+ - compatible : "marvell,mv64360-pci-error"
+ - reg : Offset and length of the register set for this device
+ - interrupts : the interrupt number for this device
+ - interrupt-parent : the phandle for the interrupt controller
+ that services interrupts for this device.
+
+ Example Discovery PCI Error Handler node:
+ pci-error@1d40 {
+ compatible = "marvell,mv64360-pci-error";
+ reg = <0x1d40 0x40 0xc28 0x4>;
+ interrupts = <12>;
+ interrupt-parent = <&PIC>;
+ };
+
+
+ s) Marvell Discovery Memory Controller nodes
+
+ Represent the Discovery's memory controller device.
+
+ Required properties:
+ - compatible : "marvell,mv64360-mem-ctrl"
+ - reg : Offset and length of the register set for this device
+ - interrupts : the interrupt number for this device
+ - interrupt-parent : the phandle for the interrupt controller
+ that services interrupts for this device.
+
+ Example Discovery Memory Controller node:
+ mem-ctrl@1400 {
+ compatible = "marvell,mv64360-mem-ctrl";
+ reg = <0x1400 0x60>;
+ interrupts = <17>;
+ interrupt-parent = <&PIC>;
+ };
+
+
diff --git a/Documentation/powerpc/dts-bindings/phy.txt b/Documentation/powerpc/dts-bindings/phy.txt
new file mode 100644
index 00000000000..bb8c742eb8c
--- /dev/null
+++ b/Documentation/powerpc/dts-bindings/phy.txt
@@ -0,0 +1,25 @@
+PHY nodes
+
+Required properties:
+
+ - device_type : Should be "ethernet-phy"
+ - interrupts : <a b> where a is the interrupt number and b is a
+ field that represents an encoding of the sense and level
+ information for the interrupt. This should be encoded based on
+ the information in section 2) depending on the type of interrupt
+ controller you have.
+ - interrupt-parent : the phandle for the interrupt controller that
+ services interrupts for this device.
+ - reg : The ID number for the phy, usually a small integer
+ - linux,phandle : phandle for this node; likely referenced by an
+ ethernet controller node.
+
+Example:
+
+ethernet-phy@0 {
+ linux,phandle = <2452000>
+ interrupt-parent = <40000>;
+ interrupts = <35 1>;
+ reg = <0>;
+ device_type = "ethernet-phy";
+};
diff --git a/Documentation/powerpc/dts-bindings/spi-bus.txt b/Documentation/powerpc/dts-bindings/spi-bus.txt
new file mode 100644
index 00000000000..e782add2e45
--- /dev/null
+++ b/Documentation/powerpc/dts-bindings/spi-bus.txt
@@ -0,0 +1,57 @@
+SPI (Serial Peripheral Interface) busses
+
+SPI busses can be described with a node for the SPI master device
+and a set of child nodes for each SPI slave on the bus. For this
+discussion, it is assumed that the system's SPI controller is in
+SPI master mode. This binding does not describe SPI controllers
+in slave mode.
+
+The SPI master node requires the following properties:
+- #address-cells - number of cells required to define a chip select
+ address on the SPI bus.
+- #size-cells - should be zero.
+- compatible - name of SPI bus controller following generic names
+ recommended practice.
+No other properties are required in the SPI bus node. It is assumed
+that a driver for an SPI bus device will understand that it is an SPI bus.
+However, the binding does not attempt to define the specific method for
+assigning chip select numbers. Since SPI chip select configuration is
+flexible and non-standardized, it is left out of this binding with the
+assumption that board specific platform code will be used to manage
+chip selects. Individual drivers can define additional properties to
+support describing the chip select layout.
+
+SPI slave nodes must be children of the SPI master node and can
+contain the following properties.
+- reg - (required) chip select address of device.
+- compatible - (required) name of SPI device following generic names
+ recommended practice
+- spi-max-frequency - (required) Maximum SPI clocking speed of device in Hz
+- spi-cpol - (optional) Empty property indicating device requires
+ inverse clock polarity (CPOL) mode
+- spi-cpha - (optional) Empty property indicating device requires
+ shifted clock phase (CPHA) mode
+- spi-cs-high - (optional) Empty property indicating device requires
+ chip select active high
+
+SPI example for an MPC5200 SPI bus:
+ spi@f00 {
+ #address-cells = <1>;
+ #size-cells = <0>;
+ compatible = "fsl,mpc5200b-spi","fsl,mpc5200-spi";
+ reg = <0xf00 0x20>;
+ interrupts = <2 13 0 2 14 0>;
+ interrupt-parent = <&mpc5200_pic>;
+
+ ethernet-switch@0 {
+ compatible = "micrel,ks8995m";
+ spi-max-frequency = <1000000>;
+ reg = <0>;
+ };
+
+ codec@1 {
+ compatible = "ti,tlv320aic26";
+ spi-max-frequency = <100000>;
+ reg = <1>;
+ };
+ };
diff --git a/Documentation/powerpc/dts-bindings/usb-ehci.txt b/Documentation/powerpc/dts-bindings/usb-ehci.txt
new file mode 100644
index 00000000000..fa18612f757
--- /dev/null
+++ b/Documentation/powerpc/dts-bindings/usb-ehci.txt
@@ -0,0 +1,25 @@
+USB EHCI controllers
+
+Required properties:
+ - compatible : should be "usb-ehci".
+ - reg : should contain at least address and length of the standard EHCI
+ register set for the device. Optional platform-dependent registers
+ (debug-port or other) can be also specified here, but only after
+ definition of standard EHCI registers.
+ - interrupts : one EHCI interrupt should be described here.
+If device registers are implemented in big endian mode, the device
+node should have "big-endian-regs" property.
+If controller implementation operates with big endian descriptors,
+"big-endian-desc" property should be specified.
+If both big endian registers and descriptors are used by the controller
+implementation, "big-endian" property can be specified instead of having
+both "big-endian-regs" and "big-endian-desc".
+
+Example (Sequoia 440EPx):
+ ehci@e0000300 {
+ compatible = "ibm,usb-ehci-440epx", "usb-ehci";
+ interrupt-parent = <&UIC0>;
+ interrupts = <1a 4>;
+ reg = <0 e0000300 90 0 e0000390 70>;
+ big-endian;
+ };
diff --git a/Documentation/powerpc/dts-bindings/xilinx.txt b/Documentation/powerpc/dts-bindings/xilinx.txt
new file mode 100644
index 00000000000..80339fe4300
--- /dev/null
+++ b/Documentation/powerpc/dts-bindings/xilinx.txt
@@ -0,0 +1,295 @@
+ d) Xilinx IP cores
+
+ The Xilinx EDK toolchain ships with a set of IP cores (devices) for use
+ in Xilinx Spartan and Virtex FPGAs. The devices cover the whole range
+ of standard device types (network, serial, etc.) and miscellaneous
+ devices (gpio, LCD, spi, etc). Also, since these devices are
+ implemented within the fpga fabric every instance of the device can be
+ synthesised with different options that change the behaviour.
+
+ Each IP-core has a set of parameters which the FPGA designer can use to
+ control how the core is synthesized. Historically, the EDK tool would
+ extract the device parameters relevant to device drivers and copy them
+ into an 'xparameters.h' in the form of #define symbols. This tells the
+ device drivers how the IP cores are configured, but it requres the kernel
+ to be recompiled every time the FPGA bitstream is resynthesized.
+
+ The new approach is to export the parameters into the device tree and
+ generate a new device tree each time the FPGA bitstream changes. The
+ parameters which used to be exported as #defines will now become
+ properties of the device node. In general, device nodes for IP-cores
+ will take the following form:
+
+ (name): (generic-name)@(base-address) {
+ compatible = "xlnx,(ip-core-name)-(HW_VER)"
+ [, (list of compatible devices), ...];
+ reg = <(baseaddr) (size)>;
+ interrupt-parent = <&interrupt-controller-phandle>;
+ interrupts = < ... >;
+ xlnx,(parameter1) = "(string-value)";
+ xlnx,(parameter2) = <(int-value)>;
+ };
+
+ (generic-name): an open firmware-style name that describes the
+ generic class of device. Preferably, this is one word, such
+ as 'serial' or 'ethernet'.
+ (ip-core-name): the name of the ip block (given after the BEGIN
+ directive in system.mhs). Should be in lowercase
+ and all underscores '_' converted to dashes '-'.
+ (name): is derived from the "PARAMETER INSTANCE" value.
+ (parameter#): C_* parameters from system.mhs. The C_ prefix is
+ dropped from the parameter name, the name is converted
+ to lowercase and all underscore '_' characters are
+ converted to dashes '-'.
+ (baseaddr): the baseaddr parameter value (often named C_BASEADDR).
+ (HW_VER): from the HW_VER parameter.
+ (size): the address range size (often C_HIGHADDR - C_BASEADDR + 1).
+
+ Typically, the compatible list will include the exact IP core version
+ followed by an older IP core version which implements the same
+ interface or any other device with the same interface.
+
+ 'reg', 'interrupt-parent' and 'interrupts' are all optional properties.
+
+ For example, the following block from system.mhs:
+
+ BEGIN opb_uartlite
+ PARAMETER INSTANCE = opb_uartlite_0
+ PARAMETER HW_VER = 1.00.b
+ PARAMETER C_BAUDRATE = 115200
+ PARAMETER C_DATA_BITS = 8
+ PARAMETER C_ODD_PARITY = 0
+ PARAMETER C_USE_PARITY = 0
+ PARAMETER C_CLK_FREQ = 50000000
+ PARAMETER C_BASEADDR = 0xEC100000
+ PARAMETER C_HIGHADDR = 0xEC10FFFF
+ BUS_INTERFACE SOPB = opb_7
+ PORT OPB_Clk = CLK_50MHz
+ PORT Interrupt = opb_uartlite_0_Interrupt
+ PORT RX = opb_uartlite_0_RX
+ PORT TX = opb_uartlite_0_TX
+ PORT OPB_Rst = sys_bus_reset_0
+ END
+
+ becomes the following device tree node:
+
+ opb_uartlite_0: serial@ec100000 {
+ device_type = "serial";
+ compatible = "xlnx,opb-uartlite-1.00.b";
+ reg = <ec100000 10000>;
+ interrupt-parent = <&opb_intc_0>;
+ interrupts = <1 0>; // got this from the opb_intc parameters
+ current-speed = <d#115200>; // standard serial device prop
+ clock-frequency = <d#50000000>; // standard serial device prop
+ xlnx,data-bits = <8>;
+ xlnx,odd-parity = <0>;
+ xlnx,use-parity = <0>;
+ };
+
+ Some IP cores actually implement 2 or more logical devices. In
+ this case, the device should still describe the whole IP core with
+ a single node and add a child node for each logical device. The
+ ranges property can be used to translate from parent IP-core to the
+ registers of each device. In addition, the parent node should be
+ compatible with the bus type 'xlnx,compound', and should contain
+ #address-cells and #size-cells, as with any other bus. (Note: this
+ makes the assumption that both logical devices have the same bus
+ binding. If this is not true, then separate nodes should be used
+ for each logical device). The 'cell-index' property can be used to
+ enumerate logical devices within an IP core. For example, the
+ following is the system.mhs entry for the dual ps2 controller found
+ on the ml403 reference design.
+
+ BEGIN opb_ps2_dual_ref
+ PARAMETER INSTANCE = opb_ps2_dual_ref_0
+ PARAMETER HW_VER = 1.00.a
+ PARAMETER C_BASEADDR = 0xA9000000
+ PARAMETER C_HIGHADDR = 0xA9001FFF
+ BUS_INTERFACE SOPB = opb_v20_0
+ PORT Sys_Intr1 = ps2_1_intr
+ PORT Sys_Intr2 = ps2_2_intr
+ PORT Clkin1 = ps2_clk_rx_1
+ PORT Clkin2 = ps2_clk_rx_2
+ PORT Clkpd1 = ps2_clk_tx_1
+ PORT Clkpd2 = ps2_clk_tx_2
+ PORT Rx1 = ps2_d_rx_1
+ PORT Rx2 = ps2_d_rx_2
+ PORT Txpd1 = ps2_d_tx_1
+ PORT Txpd2 = ps2_d_tx_2
+ END
+
+ It would result in the following device tree nodes:
+
+ opb_ps2_dual_ref_0: opb-ps2-dual-ref@a9000000 {
+ #address-cells = <1>;
+ #size-cells = <1>;
+ compatible = "xlnx,compound";
+ ranges = <0 a9000000 2000>;
+ // If this device had extra parameters, then they would
+ // go here.
+ ps2@0 {
+ compatible = "xlnx,opb-ps2-dual-ref-1.00.a";
+ reg = <0 40>;
+ interrupt-parent = <&opb_intc_0>;
+ interrupts = <3 0>;
+ cell-index = <0>;
+ };
+ ps2@1000 {
+ compatible = "xlnx,opb-ps2-dual-ref-1.00.a";
+ reg = <1000 40>;
+ interrupt-parent = <&opb_intc_0>;
+ interrupts = <3 0>;
+ cell-index = <0>;
+ };
+ };
+
+ Also, the system.mhs file defines bus attachments from the processor
+ to the devices. The device tree structure should reflect the bus
+ attachments. Again an example; this system.mhs fragment:
+
+ BEGIN ppc405_virtex4
+ PARAMETER INSTANCE = ppc405_0
+ PARAMETER HW_VER = 1.01.a
+ BUS_INTERFACE DPLB = plb_v34_0
+ BUS_INTERFACE IPLB = plb_v34_0
+ END
+
+ BEGIN opb_intc
+ PARAMETER INSTANCE = opb_intc_0
+ PARAMETER HW_VER = 1.00.c
+ PARAMETER C_BASEADDR = 0xD1000FC0
+ PARAMETER C_HIGHADDR = 0xD1000FDF
+ BUS_INTERFACE SOPB = opb_v20_0
+ END
+
+ BEGIN opb_uart16550
+ PARAMETER INSTANCE = opb_uart16550_0
+ PARAMETER HW_VER = 1.00.d
+ PARAMETER C_BASEADDR = 0xa0000000
+ PARAMETER C_HIGHADDR = 0xa0001FFF
+ BUS_INTERFACE SOPB = opb_v20_0
+ END
+
+ BEGIN plb_v34
+ PARAMETER INSTANCE = plb_v34_0
+ PARAMETER HW_VER = 1.02.a
+ END
+
+ BEGIN plb_bram_if_cntlr
+ PARAMETER INSTANCE = plb_bram_if_cntlr_0
+ PARAMETER HW_VER = 1.00.b
+ PARAMETER C_BASEADDR = 0xFFFF0000
+ PARAMETER C_HIGHADDR = 0xFFFFFFFF
+ BUS_INTERFACE SPLB = plb_v34_0
+ END
+
+ BEGIN plb2opb_bridge
+ PARAMETER INSTANCE = plb2opb_bridge_0
+ PARAMETER HW_VER = 1.01.a
+ PARAMETER C_RNG0_BASEADDR = 0x20000000
+ PARAMETER C_RNG0_HIGHADDR = 0x3FFFFFFF
+ PARAMETER C_RNG1_BASEADDR = 0x60000000
+ PARAMETER C_RNG1_HIGHADDR = 0x7FFFFFFF
+ PARAMETER C_RNG2_BASEADDR = 0x80000000
+ PARAMETER C_RNG2_HIGHADDR = 0xBFFFFFFF
+ PARAMETER C_RNG3_BASEADDR = 0xC0000000
+ PARAMETER C_RNG3_HIGHADDR = 0xDFFFFFFF
+ BUS_INTERFACE SPLB = plb_v34_0
+ BUS_INTERFACE MOPB = opb_v20_0
+ END
+
+ Gives this device tree (some properties removed for clarity):
+
+ plb@0 {
+ #address-cells = <1>;
+ #size-cells = <1>;
+ compatible = "xlnx,plb-v34-1.02.a";
+ device_type = "ibm,plb";
+ ranges; // 1:1 translation
+
+ plb_bram_if_cntrl_0: bram@ffff0000 {
+ reg = <ffff0000 10000>;
+ }
+
+ opb@20000000 {
+ #address-cells = <1>;
+ #size-cells = <1>;
+ ranges = <20000000 20000000 20000000
+ 60000000 60000000 20000000
+ 80000000 80000000 40000000
+ c0000000 c0000000 20000000>;
+
+ opb_uart16550_0: serial@a0000000 {
+ reg = <a00000000 2000>;
+ };
+
+ opb_intc_0: interrupt-controller@d1000fc0 {
+ reg = <d1000fc0 20>;
+ };
+ };
+ };
+
+ That covers the general approach to binding xilinx IP cores into the
+ device tree. The following are bindings for specific devices:
+
+ i) Xilinx ML300 Framebuffer
+
+ Simple framebuffer device from the ML300 reference design (also on the
+ ML403 reference design as well as others).
+
+ Optional properties:
+ - resolution = <xres yres> : pixel resolution of framebuffer. Some
+ implementations use a different resolution.
+ Default is <d#640 d#480>
+ - virt-resolution = <xvirt yvirt> : Size of framebuffer in memory.
+ Default is <d#1024 d#480>.
+ - rotate-display (empty) : rotate display 180 degrees.
+
+ ii) Xilinx SystemACE
+
+ The Xilinx SystemACE device is used to program FPGAs from an FPGA
+ bitstream stored on a CF card. It can also be used as a generic CF
+ interface device.
+
+ Optional properties:
+ - 8-bit (empty) : Set this property for SystemACE in 8 bit mode
+
+ iii) Xilinx EMAC and Xilinx TEMAC
+
+ Xilinx Ethernet devices. In addition to general xilinx properties
+ listed above, nodes for these devices should include a phy-handle
+ property, and may include other common network device properties
+ like local-mac-address.
+
+ iv) Xilinx Uartlite
+
+ Xilinx uartlite devices are simple fixed speed serial ports.
+
+ Required properties:
+ - current-speed : Baud rate of uartlite
+
+ v) Xilinx hwicap
+
+ Xilinx hwicap devices provide access to the configuration logic
+ of the FPGA through the Internal Configuration Access Port
+ (ICAP). The ICAP enables partial reconfiguration of the FPGA,
+ readback of the configuration information, and some control over
+ 'warm boots' of the FPGA fabric.
+
+ Required properties:
+ - xlnx,family : The family of the FPGA, necessary since the
+ capabilities of the underlying ICAP hardware
+ differ between different families. May be
+ 'virtex2p', 'virtex4', or 'virtex5'.
+
+ vi) Xilinx Uart 16550
+
+ Xilinx UART 16550 devices are very similar to the NS16550 but with
+ different register spacing and an offset from the base address.
+
+ Required properties:
+ - clock-frequency : Frequency of the clock input
+ - reg-offset : A value of 3 is required
+ - reg-shift : A value of 2 is required
+
+
diff --git a/Documentation/powerpc/qe_firmware.txt b/Documentation/powerpc/qe_firmware.txt
index 06da4d4b44f..2031ddb33d0 100644
--- a/Documentation/powerpc/qe_firmware.txt
+++ b/Documentation/powerpc/qe_firmware.txt
@@ -225,7 +225,7 @@ For example, to match the 8323, revision 1.0:
soc.major = 1
soc.minor = 0
-'padding' is neccessary for structure alignment. This field ensures that the
+'padding' is necessary for structure alignment. This field ensures that the
'extended_modes' field is aligned on a 64-bit boundary.
'extended_modes' is a bitfield that defines special functionality which has an
diff --git a/Documentation/pps/pps.txt b/Documentation/pps/pps.txt
new file mode 100644
index 00000000000..125f4ab4899
--- /dev/null
+++ b/Documentation/pps/pps.txt
@@ -0,0 +1,172 @@
+
+ PPS - Pulse Per Second
+ ----------------------
+
+(C) Copyright 2007 Rodolfo Giometti <giometti@enneenne.com>
+
+This program is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 2 of the License, or
+(at your option) any later version.
+
+This program is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+GNU General Public License for more details.
+
+
+
+Overview
+--------
+
+LinuxPPS provides a programming interface (API) to define in the
+system several PPS sources.
+
+PPS means "pulse per second" and a PPS source is just a device which
+provides a high precision signal each second so that an application
+can use it to adjust system clock time.
+
+A PPS source can be connected to a serial port (usually to the Data
+Carrier Detect pin) or to a parallel port (ACK-pin) or to a special
+CPU's GPIOs (this is the common case in embedded systems) but in each
+case when a new pulse arrives the system must apply to it a timestamp
+and record it for userland.
+
+Common use is the combination of the NTPD as userland program, with a
+GPS receiver as PPS source, to obtain a wallclock-time with
+sub-millisecond synchronisation to UTC.
+
+
+RFC considerations
+------------------
+
+While implementing a PPS API as RFC 2783 defines and using an embedded
+CPU GPIO-Pin as physical link to the signal, I encountered a deeper
+problem:
+
+ At startup it needs a file descriptor as argument for the function
+ time_pps_create().
+
+This implies that the source has a /dev/... entry. This assumption is
+ok for the serial and parallel port, where you can do something
+useful besides(!) the gathering of timestamps as it is the central
+task for a PPS-API. But this assumption does not work for a single
+purpose GPIO line. In this case even basic file-related functionality
+(like read() and write()) makes no sense at all and should not be a
+precondition for the use of a PPS-API.
+
+The problem can be simply solved if you consider that a PPS source is
+not always connected with a GPS data source.
+
+So your programs should check if the GPS data source (the serial port
+for instance) is a PPS source too, and if not they should provide the
+possibility to open another device as PPS source.
+
+In LinuxPPS the PPS sources are simply char devices usually mapped
+into files /dev/pps0, /dev/pps1, etc..
+
+
+Coding example
+--------------
+
+To register a PPS source into the kernel you should define a struct
+pps_source_info_s as follows:
+
+ static struct pps_source_info pps_ktimer_info = {
+ .name = "ktimer",
+ .path = "",
+ .mode = PPS_CAPTUREASSERT | PPS_OFFSETASSERT | \
+ PPS_ECHOASSERT | \
+ PPS_CANWAIT | PPS_TSFMT_TSPEC,
+ .echo = pps_ktimer_echo,
+ .owner = THIS_MODULE,
+ };
+
+and then calling the function pps_register_source() in your
+intialization routine as follows:
+
+ source = pps_register_source(&pps_ktimer_info,
+ PPS_CAPTUREASSERT | PPS_OFFSETASSERT);
+
+The pps_register_source() prototype is:
+
+ int pps_register_source(struct pps_source_info_s *info, int default_params)
+
+where "info" is a pointer to a structure that describes a particular
+PPS source, "default_params" tells the system what the initial default
+parameters for the device should be (it is obvious that these parameters
+must be a subset of ones defined in the struct
+pps_source_info_s which describe the capabilities of the driver).
+
+Once you have registered a new PPS source into the system you can
+signal an assert event (for example in the interrupt handler routine)
+just using:
+
+ pps_event(source, &ts, PPS_CAPTUREASSERT, ptr)
+
+where "ts" is the event's timestamp.
+
+The same function may also run the defined echo function
+(pps_ktimer_echo(), passing to it the "ptr" pointer) if the user
+asked for that... etc..
+
+Please see the file drivers/pps/clients/ktimer.c for example code.
+
+
+SYSFS support
+-------------
+
+If the SYSFS filesystem is enabled in the kernel it provides a new class:
+
+ $ ls /sys/class/pps/
+ pps0/ pps1/ pps2/
+
+Every directory is the ID of a PPS sources defined in the system and
+inside you find several files:
+
+ $ ls /sys/class/pps/pps0/
+ assert clear echo mode name path subsystem@ uevent
+
+Inside each "assert" and "clear" file you can find the timestamp and a
+sequence number:
+
+ $ cat /sys/class/pps/pps0/assert
+ 1170026870.983207967#8
+
+Where before the "#" is the timestamp in seconds; after it is the
+sequence number. Other files are:
+
+* echo: reports if the PPS source has an echo function or not;
+
+* mode: reports available PPS functioning modes;
+
+* name: reports the PPS source's name;
+
+* path: reports the PPS source's device path, that is the device the
+ PPS source is connected to (if it exists).
+
+
+Testing the PPS support
+-----------------------
+
+In order to test the PPS support even without specific hardware you can use
+the ktimer driver (see the client subsection in the PPS configuration menu)
+and the userland tools provided into Documentaion/pps/ directory.
+
+Once you have enabled the compilation of ktimer just modprobe it (if
+not statically compiled):
+
+ # modprobe ktimer
+
+and the run ppstest as follow:
+
+ $ ./ppstest /dev/pps0
+ trying PPS source "/dev/pps1"
+ found PPS source "/dev/pps1"
+ ok, found 1 source(s), now start fetching data...
+ source 0 - assert 1186592699.388832443, sequence: 364 - clear 0.000000000, sequence: 0
+ source 0 - assert 1186592700.388931295, sequence: 365 - clear 0.000000000, sequence: 0
+ source 0 - assert 1186592701.389032765, sequence: 366 - clear 0.000000000, sequence: 0
+
+Please, note that to compile userland programs you need the file timepps.h
+(see Documentation/pps/).
diff --git a/Documentation/rbtree.txt b/Documentation/rbtree.txt
index 7224459b469..aae8355d316 100644
--- a/Documentation/rbtree.txt
+++ b/Documentation/rbtree.txt
@@ -131,8 +131,8 @@ Example:
}
/* Add new node and rebalance tree. */
- rb_link_node(data->node, parent, new);
- rb_insert_color(data->node, root);
+ rb_link_node(&data->node, parent, new);
+ rb_insert_color(&data->node, root);
return TRUE;
}
@@ -146,10 +146,10 @@ To remove an existing node from a tree, call:
Example:
- struct mytype *data = mysearch(mytree, "walrus");
+ struct mytype *data = mysearch(&mytree, "walrus");
if (data) {
- rb_erase(data->node, mytree);
+ rb_erase(&data->node, &mytree);
myfree(data);
}
@@ -188,5 +188,5 @@ Example:
struct rb_node *node;
for (node = rb_first(&mytree); node; node = rb_next(node))
- printk("key=%s\n", rb_entry(node, int, keystring));
+ printk("key=%s\n", rb_entry(node, struct mytype, node)->keystring);
diff --git a/Documentation/rfkill.txt b/Documentation/rfkill.txt
index 4d3ee317a4a..b4860509c31 100644
--- a/Documentation/rfkill.txt
+++ b/Documentation/rfkill.txt
@@ -1,575 +1,139 @@
-rfkill - RF switch subsystem support
-====================================
+rfkill - RF kill switch support
+===============================
-1 Introduction
-2 Implementation details
-3 Kernel driver guidelines
-3.1 wireless device drivers
-3.2 platform/switch drivers
-3.3 input device drivers
-4 Kernel API
-5 Userspace support
+1. Introduction
+2. Implementation details
+3. Kernel API
+4. Userspace support
-1. Introduction:
+1. Introduction
-The rfkill switch subsystem exists to add a generic interface to circuitry that
-can enable or disable the signal output of a wireless *transmitter* of any
-type. By far, the most common use is to disable radio-frequency transmitters.
+The rfkill subsystem provides a generic interface to disabling any radio
+transmitter in the system. When a transmitter is blocked, it shall not
+radiate any power.
-Note that disabling the signal output means that the the transmitter is to be
-made to not emit any energy when "blocked". rfkill is not about blocking data
-transmissions, it is about blocking energy emission.
+The subsystem also provides the ability to react on button presses and
+disable all transmitters of a certain type (or all). This is intended for
+situations where transmitters need to be turned off, for example on
+aircraft.
-The rfkill subsystem offers support for keys and switches often found on
-laptops to enable wireless devices like WiFi and Bluetooth, so that these keys
-and switches actually perform an action in all wireless devices of a given type
-attached to the system.
+The rfkill subsystem has a concept of "hard" and "soft" block, which
+differ little in their meaning (block == transmitters off) but rather in
+whether they can be changed or not:
+ - hard block: read-only radio block that cannot be overriden by software
+ - soft block: writable radio block (need not be readable) that is set by
+ the system software.
-The buttons to enable and disable the wireless transmitters are important in
-situations where the user is for example using his laptop on a location where
-radio-frequency transmitters _must_ be disabled (e.g. airplanes).
-Because of this requirement, userspace support for the keys should not be made
-mandatory. Because userspace might want to perform some additional smarter
-tasks when the key is pressed, rfkill provides userspace the possibility to
-take over the task to handle the key events.
+2. Implementation details
-===============================================================================
-2: Implementation details
+The rfkill subsystem is composed of three main components:
+ * the rfkill core,
+ * the deprecated rfkill-input module (an input layer handler, being
+ replaced by userspace policy code) and
+ * the rfkill drivers.
-The rfkill subsystem is composed of various components: the rfkill class, the
-rfkill-input module (an input layer handler), and some specific input layer
-events.
+The rfkill core provides API for kernel drivers to register their radio
+transmitter with the kernel, methods for turning it on and off and, letting
+the system know about hardware-disabled states that may be implemented on
+the device.
-The rfkill class provides kernel drivers with an interface that allows them to
-know when they should enable or disable a wireless network device transmitter.
-This is enabled by the CONFIG_RFKILL Kconfig option.
+The rfkill core code also notifies userspace of state changes, and provides
+ways for userspace to query the current states. See the "Userspace support"
+section below.
-The rfkill class support makes sure userspace will be notified of all state
-changes on rfkill devices through uevents. It provides a notification chain
-for interested parties in the kernel to also get notified of rfkill state
-changes in other drivers. It creates several sysfs entries which can be used
-by userspace. See section "Userspace support".
-
-The rfkill-input module provides the kernel with the ability to implement a
-basic response when the user presses a key or button (or toggles a switch)
-related to rfkill functionality. It is an in-kernel implementation of default
-policy of reacting to rfkill-related input events and neither mandatory nor
-required for wireless drivers to operate. It is enabled by the
-CONFIG_RFKILL_INPUT Kconfig option.
-
-rfkill-input is a rfkill-related events input layer handler. This handler will
-listen to all rfkill key events and will change the rfkill state of the
-wireless devices accordingly. With this option enabled userspace could either
-do nothing or simply perform monitoring tasks.
-
-The rfkill-input module also provides EPO (emergency power-off) functionality
-for all wireless transmitters. This function cannot be overridden, and it is
-always active. rfkill EPO is related to *_RFKILL_ALL input layer events.
-
-
-Important terms for the rfkill subsystem:
-
-In order to avoid confusion, we avoid the term "switch" in rfkill when it is
-referring to an electronic control circuit that enables or disables a
-transmitter. We reserve it for the physical device a human manipulates
-(which is an input device, by the way):
-
-rfkill switch:
-
- A physical device a human manipulates. Its state can be perceived by
- the kernel either directly (through a GPIO pin, ACPI GPE) or by its
- effect on a rfkill line of a wireless device.
-
-rfkill controller:
-
- A hardware circuit that controls the state of a rfkill line, which a
- kernel driver can interact with *to modify* that state (i.e. it has
- either write-only or read/write access).
-
-rfkill line:
-
- An input channel (hardware or software) of a wireless device, which
- causes a wireless transmitter to stop emitting energy (BLOCK) when it
- is active. Point of view is extremely important here: rfkill lines are
- always seen from the PoV of a wireless device (and its driver).
-
-soft rfkill line/software rfkill line:
-
- A rfkill line the wireless device driver can directly change the state
- of. Related to rfkill_state RFKILL_STATE_SOFT_BLOCKED.
-
-hard rfkill line/hardware rfkill line:
-
- A rfkill line that works fully in hardware or firmware, and that cannot
- be overridden by the kernel driver. The hardware device or the
- firmware just exports its status to the driver, but it is read-only.
- Related to rfkill_state RFKILL_STATE_HARD_BLOCKED.
-
-The enum rfkill_state describes the rfkill state of a transmitter:
-
-When a rfkill line or rfkill controller is in the RFKILL_STATE_UNBLOCKED state,
-the wireless transmitter (radio TX circuit for example) is *enabled*. When the
-it is in the RFKILL_STATE_SOFT_BLOCKED or RFKILL_STATE_HARD_BLOCKED, the
-wireless transmitter is to be *blocked* from operating.
-
-RFKILL_STATE_SOFT_BLOCKED indicates that a call to toggle_radio() can change
-that state. RFKILL_STATE_HARD_BLOCKED indicates that a call to toggle_radio()
-will not be able to change the state and will return with a suitable error if
-attempts are made to set the state to RFKILL_STATE_UNBLOCKED.
-
-RFKILL_STATE_HARD_BLOCKED is used by drivers to signal that the device is
-locked in the BLOCKED state by a hardwire rfkill line (typically an input pin
-that, when active, forces the transmitter to be disabled) which the driver
-CANNOT override.
-
-Full rfkill functionality requires two different subsystems to cooperate: the
-input layer and the rfkill class. The input layer issues *commands* to the
-entire system requesting that devices registered to the rfkill class change
-state. The way this interaction happens is not complex, but it is not obvious
-either:
-
-Kernel Input layer:
-
- * Generates KEY_WWAN, KEY_WLAN, KEY_BLUETOOTH, SW_RFKILL_ALL, and
- other such events when the user presses certain keys, buttons, or
- toggles certain physical switches.
-
- THE INPUT LAYER IS NEVER USED TO PROPAGATE STATUS, NOTIFICATIONS OR THE
- KIND OF STUFF AN ON-SCREEN-DISPLAY APPLICATION WOULD REPORT. It is
- used to issue *commands* for the system to change behaviour, and these
- commands may or may not be carried out by some kernel driver or
- userspace application. It follows that doing user feedback based only
- on input events is broken, as there is no guarantee that an input event
- will be acted upon.
-
- Most wireless communication device drivers implementing rfkill
- functionality MUST NOT generate these events, and have no reason to
- register themselves with the input layer. Doing otherwise is a common
- misconception. There is an API to propagate rfkill status change
- information, and it is NOT the input layer.
-
-rfkill class:
-
- * Calls a hook in a driver to effectively change the wireless
- transmitter state;
- * Keeps track of the wireless transmitter state (with help from
- the driver);
- * Generates userspace notifications (uevents) and a call to a
- notification chain (kernel) when there is a wireless transmitter
- state change;
- * Connects a wireless communications driver with the common rfkill
- control system, which, for example, allows actions such as
- "switch all bluetooth devices offline" to be carried out by
- userspace or by rfkill-input.
-
- THE RFKILL CLASS NEVER ISSUES INPUT EVENTS. THE RFKILL CLASS DOES
- NOT LISTEN TO INPUT EVENTS. NO DRIVER USING THE RFKILL CLASS SHALL
- EVER LISTEN TO, OR ACT ON RFKILL INPUT EVENTS. Doing otherwise is
- a layering violation.
-
- Most wireless data communication drivers in the kernel have just to
- implement the rfkill class API to work properly. Interfacing to the
- input layer is not often required (and is very often a *bug*) on
- wireless drivers.
-
- Platform drivers often have to attach to the input layer to *issue*
- (but never to listen to) rfkill events for rfkill switches, and also to
- the rfkill class to export a control interface for the platform rfkill
- controllers to the rfkill subsystem. This does NOT mean the rfkill
- switch is attached to a rfkill class (doing so is almost always wrong).
- It just means the same kernel module is the driver for different
- devices (rfkill switches and rfkill controllers).
-
-
-Userspace input handlers (uevents) or kernel input handlers (rfkill-input):
-
- * Implements the policy of what should happen when one of the input
- layer events related to rfkill operation is received.
- * Uses the sysfs interface (userspace) or private rfkill API calls
- to tell the devices registered with the rfkill class to change
- their state (i.e. translates the input layer event into real
- action).
-
- * rfkill-input implements EPO by handling EV_SW SW_RFKILL_ALL 0
- (power off all transmitters) in a special way: it ignores any
- overrides and local state cache and forces all transmitters to the
- RFKILL_STATE_SOFT_BLOCKED state (including those which are already
- supposed to be BLOCKED).
- * rfkill EPO will remain active until rfkill-input receives an
- EV_SW SW_RFKILL_ALL 1 event. While the EPO is active, transmitters
- are locked in the blocked state (rfkill will refuse to unblock them).
- * rfkill-input implements different policies that the user can
- select for handling EV_SW SW_RFKILL_ALL 1. It will unlock rfkill,
- and either do nothing (leave transmitters blocked, but now unlocked),
- restore the transmitters to their state before the EPO, or unblock
- them all.
-
-Userspace uevent handler or kernel platform-specific drivers hooked to the
-rfkill notifier chain:
-
- * Taps into the rfkill notifier chain or to KOBJ_CHANGE uevents,
- in order to know when a device that is registered with the rfkill
- class changes state;
- * Issues feedback notifications to the user;
- * In the rare platforms where this is required, synthesizes an input
- event to command all *OTHER* rfkill devices to also change their
- statues when a specific rfkill device changes state.
-
-
-===============================================================================
-3: Kernel driver guidelines
-
-Remember: point-of-view is everything for a driver that connects to the rfkill
-subsystem. All the details below must be measured/perceived from the point of
-view of the specific driver being modified.
-
-The first thing one needs to know is whether his driver should be talking to
-the rfkill class or to the input layer. In rare cases (platform drivers), it
-could happen that you need to do both, as platform drivers often handle a
-variety of devices in the same driver.
-
-Do not mistake input devices for rfkill controllers. The only type of "rfkill
-switch" device that is to be registered with the rfkill class are those
-directly controlling the circuits that cause a wireless transmitter to stop
-working (or the software equivalent of them), i.e. what we call a rfkill
-controller. Every other kind of "rfkill switch" is just an input device and
-MUST NOT be registered with the rfkill class.
-
-A driver should register a device with the rfkill class when ALL of the
-following conditions are met (they define a rfkill controller):
-
-1. The device is/controls a data communications wireless transmitter;
-
-2. The kernel can interact with the hardware/firmware to CHANGE the wireless
- transmitter state (block/unblock TX operation);
-
-3. The transmitter can be made to not emit any energy when "blocked":
- rfkill is not about blocking data transmissions, it is about blocking
- energy emission;
-
-A driver should register a device with the input subsystem to issue
-rfkill-related events (KEY_WLAN, KEY_BLUETOOTH, KEY_WWAN, KEY_WIMAX,
-SW_RFKILL_ALL, etc) when ALL of the folowing conditions are met:
-
-1. It is directly related to some physical device the user interacts with, to
- command the O.S./firmware/hardware to enable/disable a data communications
- wireless transmitter.
-
- Examples of the physical device are: buttons, keys and switches the user
- will press/touch/slide/switch to enable or disable the wireless
- communication device.
-
-2. It is NOT slaved to another device, i.e. there is no other device that
- issues rfkill-related input events in preference to this one.
-
- Please refer to the corner cases and examples section for more details.
-
-When in doubt, do not issue input events. For drivers that should generate
-input events in some platforms, but not in others (e.g. b43), the best solution
-is to NEVER generate input events in the first place. That work should be
-deferred to a platform-specific kernel module (which will know when to generate
-events through the rfkill notifier chain) or to userspace. This avoids the
-usual maintenance problems with DMI whitelisting.
+When the device is hard-blocked (either by a call to rfkill_set_hw_state()
+or from query_hw_block) set_block() will be invoked for additional software
+block, but drivers can ignore the method call since they can use the return
+value of the function rfkill_set_hw_state() to sync the software state
+instead of keeping track of calls to set_block(). In fact, drivers should
+use the return value of rfkill_set_hw_state() unless the hardware actually
+keeps track of soft and hard block separately.
-Corner cases and examples:
-====================================
-
-1. If the device is an input device that, because of hardware or firmware,
-causes wireless transmitters to be blocked regardless of the kernel's will, it
-is still just an input device, and NOT to be registered with the rfkill class.
+3. Kernel API
-2. If the wireless transmitter switch control is read-only, it is an input
-device and not to be registered with the rfkill class (and maybe not to be made
-an input layer event source either, see below).
-3. If there is some other device driver *closer* to the actual hardware the
-user interacted with (the button/switch/key) to issue an input event, THAT is
-the device driver that should be issuing input events.
-
-E.g:
- [RFKILL slider switch] -- [GPIO hardware] -- [WLAN card rf-kill input]
- (platform driver) (wireless card driver)
-
-The user is closer to the RFKILL slide switch plaform driver, so the driver
-which must issue input events is the platform driver looking at the GPIO
-hardware, and NEVER the wireless card driver (which is just a slave). It is
-very likely that there are other leaves than just the WLAN card rf-kill input
-(e.g. a bluetooth card, etc)...
+Drivers for radio transmitters normally implement an rfkill driver.
-On the other hand, some embedded devices do this:
+Platform drivers might implement input devices if the rfkill button is just
+that, a button. If that button influences the hardware then you need to
+implement an rfkill driver instead. This also applies if the platform provides
+a way to turn on/off the transmitter(s).
- [RFKILL slider switch] -- [WLAN card rf-kill input]
- (wireless card driver)
+For some platforms, it is possible that the hardware state changes during
+suspend/hibernation, in which case it will be necessary to update the rfkill
+core with the current state is at resume time.
-In this situation, the wireless card driver *could* register itself as an input
-device and issue rf-kill related input events... but in order to AVOID the need
-for DMI whitelisting, the wireless card driver does NOT do it. Userspace (HAL)
-or a platform driver (that exists only on these embedded devices) will do the
-dirty job of issuing the input events.
+To create an rfkill driver, driver's Kconfig needs to have
+ depends on RFKILL || !RFKILL
-COMMON MISTAKES in kernel drivers, related to rfkill:
-====================================
+to ensure the driver cannot be built-in when rfkill is modular. The !RFKILL
+case allows the driver to be built when rfkill is not configured, which which
+case all rfkill API can still be used but will be provided by static inlines
+which compile to almost nothing.
-1. NEVER confuse input device keys and buttons with input device switches.
+Calling rfkill_set_hw_state() when a state change happens is required from
+rfkill drivers that control devices that can be hard-blocked unless they also
+assign the poll_hw_block() callback (then the rfkill core will poll the
+device). Don't do this unless you cannot get the event in any other way.
- 1a. Switches are always set or reset. They report the current state
- (on position or off position).
- 1b. Keys and buttons are either in the pressed or not-pressed state, and
- that's it. A "button" that latches down when you press it, and
- unlatches when you press it again is in fact a switch as far as input
- devices go.
-Add the SW_* events you need for switches, do NOT try to emulate a button using
-KEY_* events just because there is no such SW_* event yet. Do NOT try to use,
-for example, KEY_BLUETOOTH when you should be using SW_BLUETOOTH instead.
+5. Userspace support
-2. Input device switches (sources of EV_SW events) DO store their current state
-(so you *must* initialize it by issuing a gratuitous input layer event on
-driver start-up and also when resuming from sleep), and that state CAN be
-queried from userspace through IOCTLs. There is no sysfs interface for this,
-but that doesn't mean you should break things trying to hook it to the rfkill
-class to get a sysfs interface :-)
+The recommended userspace interface to use is /dev/rfkill, which is a misc
+character device that allows userspace to obtain and set the state of rfkill
+devices and sets of devices. It also notifies userspace about device addition
+and removal. The API is a simple read/write API that is defined in
+linux/rfkill.h, with one ioctl that allows turning off the deprecated input
+handler in the kernel for the transition period.
-3. Do not issue *_RFKILL_ALL events by default, unless you are sure it is the
-correct event for your switch/button. These events are emergency power-off
-events when they are trying to turn the transmitters off. An example of an
-input device which SHOULD generate *_RFKILL_ALL events is the wireless-kill
-switch in a laptop which is NOT a hotkey, but a real sliding/rocker switch.
-An example of an input device which SHOULD NOT generate *_RFKILL_ALL events by
-default, is any sort of hot key that is type-specific (e.g. the one for WLAN).
+Except for the one ioctl, communication with the kernel is done via read()
+and write() of instances of 'struct rfkill_event'. In this structure, the
+soft and hard block are properly separated (unlike sysfs, see below) and
+userspace is able to get a consistent snapshot of all rfkill devices in the
+system. Also, it is possible to switch all rfkill drivers (or all drivers of
+a specified type) into a state which also updates the default state for
+hotplugged devices.
+After an application opens /dev/rfkill, it can read the current state of
+all devices, and afterwards can poll the descriptor for hotplug or state
+change events.
-3.1 Guidelines for wireless device drivers
-------------------------------------------
+Applications must ignore operations (the "op" field) they do not handle,
+this allows the API to be extended in the future.
-(in this text, rfkill->foo means the foo field of struct rfkill).
-
-1. Each independent transmitter in a wireless device (usually there is only one
-transmitter per device) should have a SINGLE rfkill class attached to it.
-
-2. If the device does not have any sort of hardware assistance to allow the
-driver to rfkill the device, the driver should emulate it by taking all actions
-required to silence the transmitter.
-
-3. If it is impossible to silence the transmitter (i.e. it still emits energy,
-even if it is just in brief pulses, when there is no data to transmit and there
-is no hardware support to turn it off) do NOT lie to the users. Do not attach
-it to a rfkill class. The rfkill subsystem does not deal with data
-transmission, it deals with energy emission. If the transmitter is emitting
-energy, it is not blocked in rfkill terms.
-
-4. It doesn't matter if the device has multiple rfkill input lines affecting
-the same transmitter, their combined state is to be exported as a single state
-per transmitter (see rule 1).
-
-This rule exists because users of the rfkill subsystem expect to get (and set,
-when possible) the overall transmitter rfkill state, not of a particular rfkill
-line.
-
-5. The wireless device driver MUST NOT leave the transmitter enabled during
-suspend and hibernation unless:
-
- 5.1. The transmitter has to be enabled for some sort of functionality
- like wake-on-wireless-packet or autonomous packed forwarding in a mesh
- network, and that functionality is enabled for this suspend/hibernation
- cycle.
-
-AND
-
- 5.2. The device was not on a user-requested BLOCKED state before
- the suspend (i.e. the driver must NOT unblock a device, not even
- to support wake-on-wireless-packet or remain in the mesh).
-
-In other words, there is absolutely no allowed scenario where a driver can
-automatically take action to unblock a rfkill controller (obviously, this deals
-with scenarios where soft-blocking or both soft and hard blocking is happening.
-Scenarios where hardware rfkill lines are the only ones blocking the
-transmitter are outside of this rule, since the wireless device driver does not
-control its input hardware rfkill lines in the first place).
-
-6. During resume, rfkill will try to restore its previous state.
-
-7. After a rfkill class is suspended, it will *not* call rfkill->toggle_radio
-until it is resumed.
-
-
-Example of a WLAN wireless driver connected to the rfkill subsystem:
---------------------------------------------------------------------
-
-A certain WLAN card has one input pin that causes it to block the transmitter
-and makes the status of that input pin available (only for reading!) to the
-kernel driver. This is a hard rfkill input line (it cannot be overridden by
-the kernel driver).
-
-The card also has one PCI register that, if manipulated by the driver, causes
-it to block the transmitter. This is a soft rfkill input line.
-
-It has also a thermal protection circuitry that shuts down its transmitter if
-the card overheats, and makes the status of that protection available (only for
-reading!) to the kernel driver. This is also a hard rfkill input line.
-
-If either one of these rfkill lines are active, the transmitter is blocked by
-the hardware and forced offline.
-
-The driver should allocate and attach to its struct device *ONE* instance of
-the rfkill class (there is only one transmitter).
-
-It can implement the get_state() hook, and return RFKILL_STATE_HARD_BLOCKED if
-either one of its two hard rfkill input lines are active. If the two hard
-rfkill lines are inactive, it must return RFKILL_STATE_SOFT_BLOCKED if its soft
-rfkill input line is active. Only if none of the rfkill input lines are
-active, will it return RFKILL_STATE_UNBLOCKED.
-
-Since the device has a hardware rfkill line, it IS subject to state changes
-external to rfkill. Therefore, the driver must make sure that it calls
-rfkill_force_state() to keep the status always up-to-date, and it must do a
-rfkill_force_state() on resume from sleep.
-
-Every time the driver gets a notification from the card that one of its rfkill
-lines changed state (polling might be needed on badly designed cards that don't
-generate interrupts for such events), it recomputes the rfkill state as per
-above, and calls rfkill_force_state() to update it.
-
-The driver should implement the toggle_radio() hook, that:
-
-1. Returns an error if one of the hardware rfkill lines are active, and the
-caller asked for RFKILL_STATE_UNBLOCKED.
-
-2. Activates the soft rfkill line if the caller asked for state
-RFKILL_STATE_SOFT_BLOCKED. It should do this even if one of the hard rfkill
-lines are active, effectively double-blocking the transmitter.
-
-3. Deactivates the soft rfkill line if none of the hardware rfkill lines are
-active and the caller asked for RFKILL_STATE_UNBLOCKED.
-
-===============================================================================
-4: Kernel API
-
-To build a driver with rfkill subsystem support, the driver should depend on
-(or select) the Kconfig symbol RFKILL; it should _not_ depend on RKFILL_INPUT.
-
-The hardware the driver talks to may be write-only (where the current state
-of the hardware is unknown), or read-write (where the hardware can be queried
-about its current state).
-
-The rfkill class will call the get_state hook of a device every time it needs
-to know the *real* current state of the hardware. This can happen often, but
-it does not do any polling, so it is not enough on hardware that is subject
-to state changes outside of the rfkill subsystem.
-
-Therefore, calling rfkill_force_state() when a state change happens is
-mandatory when the device has a hardware rfkill line, or when something else
-like the firmware could cause its state to be changed without going through the
-rfkill class.
-
-Some hardware provides events when its status changes. In these cases, it is
-best for the driver to not provide a get_state hook, and instead register the
-rfkill class *already* with the correct status, and keep it updated using
-rfkill_force_state() when it gets an event from the hardware.
-
-rfkill_force_state() must be used on the device resume handlers to update the
-rfkill status, should there be any chance of the device status changing during
-the sleep.
-
-There is no provision for a statically-allocated rfkill struct. You must
-use rfkill_allocate() to allocate one.
-
-You should:
- - rfkill_allocate()
- - modify rfkill fields (flags, name)
- - modify state to the current hardware state (THIS IS THE ONLY TIME
- YOU CAN ACCESS state DIRECTLY)
- - rfkill_register()
-
-The only way to set a device to the RFKILL_STATE_HARD_BLOCKED state is through
-a suitable return of get_state() or through rfkill_force_state().
-
-When a device is in the RFKILL_STATE_HARD_BLOCKED state, the only way to switch
-it to a different state is through a suitable return of get_state() or through
-rfkill_force_state().
-
-If toggle_radio() is called to set a device to state RFKILL_STATE_SOFT_BLOCKED
-when that device is already at the RFKILL_STATE_HARD_BLOCKED state, it should
-not return an error. Instead, it should try to double-block the transmitter,
-so that its state will change from RFKILL_STATE_HARD_BLOCKED to
-RFKILL_STATE_SOFT_BLOCKED should the hardware blocking cease.
-
-Please refer to the source for more documentation.
-
-===============================================================================
-5: Userspace support
-
-rfkill devices issue uevents (with an action of "change"), with the following
-environment variables set:
-
-RFKILL_NAME
-RFKILL_STATE
-RFKILL_TYPE
-
-The ABI for these variables is defined by the sysfs attributes. It is best
-to take a quick look at the source to make sure of the possible values.
-
-It is expected that HAL will trap those, and bridge them to DBUS, etc. These
-events CAN and SHOULD be used to give feedback to the user about the rfkill
-status of the system.
-
-Input devices may issue events that are related to rfkill. These are the
-various KEY_* events and SW_* events supported by rfkill-input.c.
-
-******IMPORTANT******
-When rfkill-input is ACTIVE, userspace is NOT TO CHANGE THE STATE OF AN RFKILL
-SWITCH IN RESPONSE TO AN INPUT EVENT also handled by rfkill-input, unless it
-has set to true the user_claim attribute for that particular switch. This rule
-is *absolute*; do NOT violate it.
-******IMPORTANT******
-
-Userspace must not assume it is the only source of control for rfkill switches.
-Their state CAN and WILL change due to firmware actions, direct user actions,
-and the rfkill-input EPO override for *_RFKILL_ALL.
-
-When rfkill-input is not active, userspace must initiate a rfkill status
-change by writing to the "state" attribute in order for anything to happen.
-
-Take particular care to implement EV_SW SW_RFKILL_ALL properly. When that
-switch is set to OFF, *every* rfkill device *MUST* be immediately put into the
-RFKILL_STATE_SOFT_BLOCKED state, no questions asked.
-
-The following sysfs entries will be created:
+Additionally, each rfkill device is registered in sysfs and there has the
+following attributes:
name: Name assigned by driver to this key (interface or driver name).
- type: Name of the key type ("wlan", "bluetooth", etc).
+ type: Driver type string ("wlan", "bluetooth", etc).
+ persistent: Whether the soft blocked state is initialised from
+ non-volatile storage at startup.
state: Current state of the transmitter
0: RFKILL_STATE_SOFT_BLOCKED
- transmitter is forced off, but one can override it
- by a write to the state attribute;
+ transmitter is turned off by software
1: RFKILL_STATE_UNBLOCKED
- transmiter is NOT forced off, and may operate if
- all other conditions for such operation are met
- (such as interface is up and configured, etc);
+ transmitter is (potentially) active
2: RFKILL_STATE_HARD_BLOCKED
transmitter is forced off by something outside of
- the driver's control. One cannot set a device to
- this state through writes to the state attribute;
- claim: 1: Userspace handles events, 0: Kernel handles events
-
-Both the "state" and "claim" entries are also writable. For the "state" entry
-this means that when 1 or 0 is written, the device rfkill state (if not yet in
-the requested state), will be will be toggled accordingly.
-
-For the "claim" entry writing 1 to it means that the kernel no longer handles
-key events even though RFKILL_INPUT input was enabled. When "claim" has been
-set to 0, userspace should make sure that it listens for the input events or
-check the sysfs "state" entry regularly to correctly perform the required tasks
-when the rkfill key is pressed.
-
-A note about input devices and EV_SW events:
-
-In order to know the current state of an input device switch (like
-SW_RFKILL_ALL), you will need to use an IOCTL. That information is not
-available through sysfs in a generic way at this time, and it is not available
-through the rfkill class AT ALL.
+ the driver's control.
+ This file is deprecated because it can only properly show
+ three of the four possible states, soft-and-hard-blocked is
+ missing.
+ claim: 0: Kernel handles events
+ This file is deprecated because there no longer is a way to
+ claim just control over a single rfkill instance.
+
+rfkill devices also issue uevents (with an action of "change"), with the
+following environment variables set:
+
+RFKILL_NAME
+RFKILL_STATE
+RFKILL_TYPE
+
+The contents of these variables corresponds to the "name", "state" and
+"type" sysfs files explained above.
diff --git a/Documentation/robust-futex-ABI.txt b/Documentation/robust-futex-ABI.txt
index 535f69fab45..fd1cd8aae4e 100644
--- a/Documentation/robust-futex-ABI.txt
+++ b/Documentation/robust-futex-ABI.txt
@@ -135,7 +135,7 @@ manipulating this list), the user code must observe the following
protocol on 'lock entry' insertion and removal:
On insertion:
- 1) set the 'list_op_pending' word to the address of the 'lock word'
+ 1) set the 'list_op_pending' word to the address of the 'lock entry'
to be inserted,
2) acquire the futex lock,
3) add the lock entry, with its thread id (TID) in the bottom 29 bits
@@ -143,7 +143,7 @@ On insertion:
4) clear the 'list_op_pending' word.
On removal:
- 1) set the 'list_op_pending' word to the address of the 'lock word'
+ 1) set the 'list_op_pending' word to the address of the 'lock entry'
to be removed,
2) remove the lock entry for this lock from the 'head' list,
2) release the futex lock, and
diff --git a/Documentation/s390/Debugging390.txt b/Documentation/s390/Debugging390.txt
index 10711d9f078..1eb576a023b 100644
--- a/Documentation/s390/Debugging390.txt
+++ b/Documentation/s390/Debugging390.txt
@@ -1984,7 +1984,7 @@ break *$pc
break *0x400618
-heres a really useful one for large programs
+Here's a really useful one for large programs
rbr
Set a breakpoint for all functions matching REGEXP
e.g.
@@ -2211,7 +2211,7 @@ Breakpoint 2 at 0x4d87a4: file top.c, line 2609.
#5 0x51692c in readline_internal () at readline.c:521
#6 0x5164fe in readline (prompt=0x7ffff810 "\177ÿøx\177ÿ÷Ø\177ÿøxÀ")
at readline.c:349
-#7 0x4d7a8a in command_line_input (prrompt=0x564420 "(gdb) ", repeat=1,
+#7 0x4d7a8a in command_line_input (prompt=0x564420 "(gdb) ", repeat=1,
annotation_suffix=0x4d6b44 "prompt") at top.c:2091
#8 0x4d6cf0 in command_loop () at top.c:1345
#9 0x4e25bc in main (argc=1, argv=0x7ffffdf4) at main.c:635
diff --git a/Documentation/scheduler/sched-nice-design.txt b/Documentation/scheduler/sched-nice-design.txt
index e2bae5a577e..3ac1e46d536 100644
--- a/Documentation/scheduler/sched-nice-design.txt
+++ b/Documentation/scheduler/sched-nice-design.txt
@@ -55,7 +55,7 @@ To sum it up: we always wanted to make nice levels more consistent, but
within the constraints of HZ and jiffies and their nasty design level
coupling to timeslices and granularity it was not really viable.
-The second (less frequent but still periodically occuring) complaint
+The second (less frequent but still periodically occurring) complaint
about Linux's nice level support was its assymetry around the origo
(which you can see demonstrated in the picture above), or more
accurately: the fact that nice level behavior depended on the _absolute_
diff --git a/Documentation/scheduler/sched-rt-group.txt b/Documentation/scheduler/sched-rt-group.txt
index 5ba4d3fc625..86eabe6c341 100644
--- a/Documentation/scheduler/sched-rt-group.txt
+++ b/Documentation/scheduler/sched-rt-group.txt
@@ -4,6 +4,7 @@
CONTENTS
========
+0. WARNING
1. Overview
1.1 The problem
1.2 The solution
@@ -14,6 +15,23 @@ CONTENTS
3. Future plans
+0. WARNING
+==========
+
+ Fiddling with these settings can result in an unstable system, the knobs are
+ root only and assumes root knows what he is doing.
+
+Most notable:
+
+ * very small values in sched_rt_period_us can result in an unstable
+ system when the period is smaller than either the available hrtimer
+ resolution, or the time it takes to handle the budget refresh itself.
+
+ * very small values in sched_rt_runtime_us can result in an unstable
+ system when the runtime is so small the system has difficulty making
+ forward progress (NOTE: the migration thread and kstopmachine both
+ are real-time processes).
+
1. Overview
===========
@@ -55,7 +73,7 @@ The remaining CPU time will be used for user input and other tasks. Because
realtime tasks have explicitly allocated the CPU time they need to perform
their tasks, buffer underruns in the graphics or audio can be eliminated.
-NOTE: the above example is not fully implemented as of yet (2.6.25). We still
+NOTE: the above example is not fully implemented yet. We still
lack an EDF scheduler to make non-uniform periods usable.
@@ -122,14 +140,15 @@ The other option is:
.o CONFIG_CGROUP_SCHED (aka "Basis for grouping tasks" = "Control groups")
-This uses the /cgroup virtual file system and "/cgroup/<cgroup>/cpu.rt_runtime_us"
-to control the CPU time reserved for each control group instead.
+This uses the /cgroup virtual file system and
+"/cgroup/<cgroup>/cpu.rt_runtime_us" to control the CPU time reserved for each
+control group instead.
For more information on working with control groups, you should read
Documentation/cgroups/cgroups.txt as well.
-Group settings are checked against the following limits in order to keep the configuration
-schedulable:
+Group settings are checked against the following limits in order to keep the
+configuration schedulable:
\Sum_{i} runtime_{i} / global_period <= global_runtime / global_period
@@ -169,9 +188,9 @@ get their allocated time.
Implementing SCHED_EDF might take a while to complete. Priority Inheritance is
the biggest challenge as the current linux PI infrastructure is geared towards
-the limited static priority levels 0-139. With deadline scheduling you need to
+the limited static priority levels 0-99. With deadline scheduling you need to
do deadline inheritance (since priority is inversely proportional to the
-deadline delta (deadline - now).
+deadline delta (deadline - now)).
This means the whole PI machinery will have to be reworked - and that is one of
the most complex pieces of code we have.
diff --git a/Documentation/scsi/aic79xx.txt b/Documentation/scsi/aic79xx.txt
index 683ccae00ad..c014eccaf19 100644
--- a/Documentation/scsi/aic79xx.txt
+++ b/Documentation/scsi/aic79xx.txt
@@ -194,7 +194,7 @@ The following information is available in this file:
- Packetized SCSI Protocol at 160MB/s and 320MB/s
- Quick Arbitration Selection (QAS)
- Retained Training Information (Rev B. ASIC only)
- - Interrupt Coalessing
+ - Interrupt Coalescing
- Initiator Mode (target mode not currently
supported)
- Support for the PCI-X standard up to 133MHz
diff --git a/Documentation/scsi/ncr53c8xx.txt b/Documentation/scsi/ncr53c8xx.txt
index 230e30846ef..08e2b4d04aa 100644
--- a/Documentation/scsi/ncr53c8xx.txt
+++ b/Documentation/scsi/ncr53c8xx.txt
@@ -206,7 +206,7 @@ of MOVE MEMORY instructions.
The 896 and the 895A allows handling of the phase mismatch context from
SCRIPTS (avoids the phase mismatch interrupt that stops the SCSI processor
until the C code has saved the context of the transfer).
-Implementing this without using LOAD/STORE instructions would be painfull
+Implementing this without using LOAD/STORE instructions would be painful
and I didn't even want to try it.
The 896 chip supports 64 bit PCI transactions and addressing, while the
@@ -240,7 +240,7 @@ characteristics. This feature may also reduce average command latency.
In order to really gain advantage of this feature, devices must have
a reasonable cache size (No miracle is to be expected for a low-end
hard disk with 128 KB or less).
-Some kown SCSI devices do not properly support tagged command queuing.
+Some known SCSI devices do not properly support tagged command queuing.
Generally, firmware revisions that fix this kind of problems are available
at respective vendor web/ftp sites.
All I can say is that the hard disks I use on my machines behave well with
diff --git a/Documentation/scsi/scsi_fc_transport.txt b/Documentation/scsi/scsi_fc_transport.txt
index e5b071d4661..d7f181701dc 100644
--- a/Documentation/scsi/scsi_fc_transport.txt
+++ b/Documentation/scsi/scsi_fc_transport.txt
@@ -1,10 +1,11 @@
SCSI FC Tansport
=============================================
-Date: 4/12/2007
+Date: 11/18/2008
Kernel Revisions for features:
rports : <<TBS>>
- vports : 2.6.22 (? TBD)
+ vports : 2.6.22
+ bsg support : 2.6.30 (?TBD?)
Introduction
@@ -15,6 +16,7 @@ The FC transport can be found at:
drivers/scsi/scsi_transport_fc.c
include/scsi/scsi_transport_fc.h
include/scsi/scsi_netlink_fc.h
+ include/scsi/scsi_bsg_fc.h
This file is found at Documentation/scsi/scsi_fc_transport.txt
@@ -472,6 +474,14 @@ int
fc_vport_terminate(struct fc_vport *vport)
+FC BSG support (CT & ELS passthru, and more)
+========================================================================
+<< To Be Supplied >>
+
+
+
+
+
Credits
=======
The following people have contributed to this document:
diff --git a/Documentation/scsi/scsi_mid_low_api.txt b/Documentation/scsi/scsi_mid_low_api.txt
index a6d5354639b..de67229251d 100644
--- a/Documentation/scsi/scsi_mid_low_api.txt
+++ b/Documentation/scsi/scsi_mid_low_api.txt
@@ -1271,6 +1271,11 @@ of interest:
hostdata[0] - area reserved for LLD at end of struct Scsi_Host. Size
is set by the second argument (named 'xtr_bytes') to
scsi_host_alloc() or scsi_register().
+ vendor_id - a unique value that identifies the vendor supplying
+ the LLD for the Scsi_Host. Used most often in validating
+ vendor-specific message requests. Value consists of an
+ identifier type and a vendor-specific value.
+ See scsi_netlink.h for a description of valid formats.
The scsi_host structure is defined in include/scsi/scsi_host.h
diff --git a/Documentation/scsi/sym53c8xx_2.txt b/Documentation/scsi/sym53c8xx_2.txt
index 49ea5c58c6b..eb9a7b905b6 100644
--- a/Documentation/scsi/sym53c8xx_2.txt
+++ b/Documentation/scsi/sym53c8xx_2.txt
@@ -206,7 +206,7 @@ characteristics. This feature may also reduce average command latency.
In order to really gain advantage of this feature, devices must have
a reasonable cache size (No miracle is to be expected for a low-end
hard disk with 128 KB or less).
-Some kown old SCSI devices do not properly support tagged command queuing.
+Some known old SCSI devices do not properly support tagged command queuing.
Generally, firmware revisions that fix this kind of problems are available
at respective vendor web/ftp sites.
All I can say is that I never have had problem with tagged queuing using
diff --git a/Documentation/sound/alsa/ALSA-Configuration.txt b/Documentation/sound/alsa/ALSA-Configuration.txt
index 012858d2b11..4252697a95d 100644
--- a/Documentation/sound/alsa/ALSA-Configuration.txt
+++ b/Documentation/sound/alsa/ALSA-Configuration.txt
@@ -460,6 +460,25 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed.
The power-management is supported.
+ Module snd-ctxfi
+ ----------------
+
+ Module for Creative Sound Blaster X-Fi boards (20k1 / 20k2 chips)
+ * Creative Sound Blaster X-Fi Titanium Fatal1ty Champion Series
+ * Creative Sound Blaster X-Fi Titanium Fatal1ty Professional Series
+ * Creative Sound Blaster X-Fi Titanium Professional Audio
+ * Creative Sound Blaster X-Fi Titanium
+ * Creative Sound Blaster X-Fi Elite Pro
+ * Creative Sound Blaster X-Fi Platinum
+ * Creative Sound Blaster X-Fi Fatal1ty
+ * Creative Sound Blaster X-Fi XtremeGamer
+ * Creative Sound Blaster X-Fi XtremeMusic
+
+ reference_rate - reference sample rate, 44100 or 48000 (default)
+ multiple - multiple to ref. sample rate, 1 or 2 (default)
+
+ This module supports multiple cards.
+
Module snd-darla20
------------------
@@ -754,7 +773,7 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed.
single_cmd - Use single immediate commands to communicate with
codecs (for debugging only)
enable_msi - Enable Message Signaled Interrupt (MSI) (default = off)
- power_save - Automatic power-saving timtout (in second, 0 =
+ power_save - Automatic power-saving timeout (in second, 0 =
disable)
power_save_controller - Reset HD-audio controller in power-saving mode
(default = on)
@@ -925,6 +944,7 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed.
* Onkyo SE-90PCI
* Onkyo SE-200PCI
* ESI Juli@
+ * ESI Maya44
* Hercules Fortissimo IV
* EGO-SYS WaveTerminal 192M
@@ -933,7 +953,7 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed.
prodigy71xt, prodigy71hifi, prodigyhd2, prodigy192,
juli, aureon51, aureon71, universe, ap192, k8x800,
phase22, phase28, ms300, av710, se200pci, se90pci,
- fortissimo4, sn25p, WT192M
+ fortissimo4, sn25p, WT192M, maya44
This module supports multiple cards and autoprobe.
@@ -1093,6 +1113,13 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed.
This module supports multiple cards.
The driver requires the firmware loader support on kernel.
+ Module snd-lx6464es
+ -------------------
+
+ Module for Digigram LX6464ES boards
+
+ This module supports multiple cards.
+
Module snd-maestro3
-------------------
@@ -1543,13 +1570,15 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed.
Module snd-sc6000
-----------------
- Module for Gallant SC-6000 soundcard.
+ Module for Gallant SC-6000 soundcard and later models: SC-6600
+ and SC-7000.
port - Port # (0x220 or 0x240)
mss_port - MSS Port # (0x530 or 0xe80)
irq - IRQ # (5,7,9,10,11)
mpu_irq - MPU-401 IRQ # (5,7,9,10) ,0 - no MPU-401 irq
dma - DMA # (1,3,0)
+ joystick - Enable gameport - 0 = disable (default), 1 = enable
This module supports multiple cards.
@@ -1859,7 +1888,8 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed.
-------------------
Module for sound cards based on the Asus AV100/AV200 chips,
- i.e., Xonar D1, DX, D2, D2X, HDAV1.3 (Deluxe), and Essence STX.
+ i.e., Xonar D1, DX, D2, D2X, HDAV1.3 (Deluxe), Essence ST
+ (Deluxe) and Essence STX.
This module supports autoprobe and multiple cards.
diff --git a/Documentation/sound/alsa/HD-Audio-Models.txt b/Documentation/sound/alsa/HD-Audio-Models.txt
index 322869fc8a9..939a3dd5814 100644
--- a/Documentation/sound/alsa/HD-Audio-Models.txt
+++ b/Documentation/sound/alsa/HD-Audio-Models.txt
@@ -36,6 +36,7 @@ ALC260
acer Acer TravelMate
will Will laptops (PB V7900)
replacer Replacer 672V
+ favorit100 Maxdata Favorit 100XS
basic fixed pin assignment (old default model)
test for testing/debugging purpose, almost all controls can
adjusted. Appearing only when compiled with
@@ -85,10 +86,11 @@ ALC269
eeepc-p703 ASUS Eeepc P703 P900A
eeepc-p901 ASUS Eeepc P901 S101
fujitsu FSC Amilo
+ lifebook Fujitsu Lifebook S6420
auto auto-config reading BIOS (default)
-ALC662/663
-==========
+ALC662/663/272
+==============
3stack-dig 3-stack (2-channel) with SPDIF
3stack-6ch 3-stack (6-channel)
3stack-6ch-dig 3-stack (6-channel) with SPDIF
@@ -107,6 +109,9 @@ ALC662/663
asus-mode4 ASUS
asus-mode5 ASUS
asus-mode6 ASUS
+ dell Dell with ALC272
+ dell-zm1 Dell ZM1 with ALC272
+ samsung-nc10 Samsung NC10 mini notebook
auto auto-config reading BIOS (default)
ALC882/885
@@ -118,6 +123,7 @@ ALC882/885
asus-a7j ASUS A7J
asus-a7m ASUS A7M
macpro MacPro support
+ mb5 Macbook 5,1
mbp3 Macbook Pro rev3
imac24 iMac 24'' with jack detection
w2jc ASUS W2JC
@@ -133,10 +139,13 @@ ALC883/888
acer Acer laptops (Travelmate 3012WTMi, Aspire 5600, etc)
acer-aspire Acer Aspire 9810
acer-aspire-4930g Acer Aspire 4930G
+ acer-aspire-6530g Acer Aspire 6530G
+ acer-aspire-8930g Acer Aspire 8930G
medion Medion Laptops
medion-md2 Medion MD2
targa-dig Targa/MSI
- targa-2ch-dig Targs/MSI with 2-channel
+ targa-2ch-dig Targa/MSI with 2-channel
+ targa-8ch-dig Targa/MSI with 8-channel (MSI GX620)
laptop-eapd 3-jack with SPDIF I/O and EAPD (Clevo M540JE, M550JE)
lenovo-101e Lenovo 101E
lenovo-nb0763 Lenovo NB0763
@@ -150,6 +159,9 @@ ALC883/888
fujitsu-pi2515 Fujitsu AMILO Pi2515
fujitsu-xa3530 Fujitsu AMILO XA3530
3stack-6ch-intel Intel DG33* boards
+ asus-p5q ASUS P5Q-EM boards
+ mb31 MacBook 3,1
+ sony-vaio-tt Sony VAIO TT
auto auto-config reading BIOS (default)
ALC861/660
@@ -228,6 +240,7 @@ AD1986A
laptop-automute 2-channel with EAPD and HP-automute (Lenovo N100)
ultra 2-channel with EAPD (Samsung Ultra tablet PC)
samsung 2-channel with EAPD (Samsung R65)
+ samsung-p50 2-channel with HP-automute (Samsung P50)
AD1988/AD1988B/AD1989A/AD1989B
==============================
@@ -348,6 +361,7 @@ STAC92HD71B*
hp-m4 HP mini 1000
hp-dv5 HP dv series
hp-hdx HP HDX series
+ hp-dv4-1222nr HP dv4-1222nr (with LED support)
auto BIOS setup (default)
STAC92HD73*
diff --git a/Documentation/sound/alsa/HD-Audio.txt b/Documentation/sound/alsa/HD-Audio.txt
index 88b7433d2f1..71ac995b191 100644
--- a/Documentation/sound/alsa/HD-Audio.txt
+++ b/Documentation/sound/alsa/HD-Audio.txt
@@ -16,7 +16,7 @@ methods for the HD-audio hardware.
The HD-audio component consists of two parts: the controller chip and
the codec chips on the HD-audio bus. Linux provides a single driver
for all controllers, snd-hda-intel. Although the driver name contains
-a word of a well-known harware vendor, it's not specific to it but for
+a word of a well-known hardware vendor, it's not specific to it but for
all controller chips by other companies. Since the HD-audio
controllers are supposed to be compatible, the single snd-hda-driver
should work in most cases. But, not surprisingly, there are known
diff --git a/Documentation/sound/alsa/Procfile.txt b/Documentation/sound/alsa/Procfile.txt
index cfac20cf9e3..719a819f8cc 100644
--- a/Documentation/sound/alsa/Procfile.txt
+++ b/Documentation/sound/alsa/Procfile.txt
@@ -88,26 +88,39 @@ card*/pcm*/info
substreams, etc.
card*/pcm*/xrun_debug
- This file appears when CONFIG_SND_DEBUG=y.
- This shows the status of xrun (= buffer overrun/xrun) debug of
- ALSA PCM middle layer, as an integer from 0 to 2. The value
- can be changed by writing to this file, such as
-
- # cat 2 > /proc/asound/card0/pcm0p/xrun_debug
-
- When this value is greater than 0, the driver will show the
- messages to kernel log when an xrun is detected. The debug
- message is shown also when the invalid H/W pointer is detected
- at the update of periods (usually called from the interrupt
+ This file appears when CONFIG_SND_DEBUG=y and
+ CONFIG_PCM_XRUN_DEBUG=y.
+ This shows the status of xrun (= buffer overrun/xrun) and
+ invalid PCM position debug/check of ALSA PCM middle layer.
+ It takes an integer value, can be changed by writing to this
+ file, such as
+
+ # cat 5 > /proc/asound/card0/pcm0p/xrun_debug
+
+ The value consists of the following bit flags:
+ bit 0 = Enable XRUN/jiffies debug messages
+ bit 1 = Show stack trace at XRUN / jiffies check
+ bit 2 = Enable additional jiffies check
+ bit 3 = Log hwptr update at each period interrupt
+ bit 4 = Log hwptr update at each snd_pcm_update_hw_ptr()
+
+ When the bit 0 is set, the driver will show the messages to
+ kernel log when an xrun is detected. The debug message is
+ shown also when the invalid H/W pointer is detected at the
+ update of periods (usually called from the interrupt
handler).
- When this value is greater than 1, the driver will show the
- stack trace additionally. This may help the debugging.
+ When the bit 1 is set, the driver will show the stack trace
+ additionally. This may help the debugging.
- Since 2.6.30, this option also enables the hwptr check using
+ Since 2.6.30, this option can enable the hwptr check using
jiffies. This detects spontaneous invalid pointer callback
values, but can be lead to too much corrections for a (mostly
buggy) hardware that doesn't give smooth pointer updates.
+ This feature is enabled via the bit 2.
+
+ Bits 3 and 4 are for logging the hwptr records. Note that
+ these will give flood of kernel messages.
card*/pcm*/sub*/info
The general information of this PCM sub-stream.
diff --git a/Documentation/sound/alsa/README.maya44 b/Documentation/sound/alsa/README.maya44
new file mode 100644
index 00000000000..0e41576fa13
--- /dev/null
+++ b/Documentation/sound/alsa/README.maya44
@@ -0,0 +1,163 @@
+NOTE: The following is the original document of Rainer's patch that the
+current maya44 code based on. Some contents might be obsoleted, but I
+keep here as reference -- tiwai
+
+----------------------------------------------------------------
+
+STATE OF DEVELOPMENT:
+
+This driver is being developed on the initiative of Piotr Makowski (oponek@gmail.com) and financed by Lars Bergmann.
+Development is carried out by Rainer Zimmermann (mail@lightshed.de).
+
+ESI provided a sample Maya44 card for the development work.
+
+However, unfortunately it has turned out difficult to get detailed programming information, so I (Rainer Zimmermann) had to find out some card-specific information by experiment and conjecture. Some information (in particular, several GPIO bits) is still missing.
+
+This is the first testing version of the Maya44 driver released to the alsa-devel mailing list (Feb 5, 2008).
+
+
+The following functions work, as tested by Rainer Zimmermann and Piotr Makowski:
+
+- playback and capture at all sampling rates
+- input/output level
+- crossmixing
+- line/mic switch
+- phantom power switch
+- analogue monitor a.k.a bypass
+
+
+The following functions *should* work, but are not fully tested:
+
+- Channel 3+4 analogue - S/PDIF input switching
+- S/PDIF output
+- all inputs/outputs on the M/IO/DIO extension card
+- internal/external clock selection
+
+
+*In particular, we would appreciate testing of these functions by anyone who has access to an M/IO/DIO extension card.*
+
+
+Things that do not seem to work:
+
+- The level meters ("multi track") in 'alsamixer' do not seem to react to signals in (if this is a bug, it would probably be in the existing ICE1724 code).
+
+- Ardour 2.1 seems to work only via JACK, not using ALSA directly or via OSS. This still needs to be tracked down.
+
+
+DRIVER DETAILS:
+
+the following files were added:
+
+pci/ice1724/maya44.c - Maya44 specific code
+pci/ice1724/maya44.h
+pci/ice1724/ice1724.patch
+pci/ice1724/ice1724.h.patch - PROPOSED patch to ice1724.h (see SAMPLING RATES)
+i2c/other/wm8776.c - low-level access routines for Wolfson WM8776 codecs
+include/wm8776.h
+
+
+Note that the wm8776.c code is meant to be card-independent and does not actually register the codec with the ALSA infrastructure.
+This is done in maya44.c, mainly because some of the WM8776 controls are used in Maya44-specific ways, and should be named appropriately.
+
+
+the following files were created in pci/ice1724, simply #including the corresponding file from the alsa-kernel tree:
+
+wtm.h
+vt1720_mobo.h
+revo.h
+prodigy192.h
+pontis.h
+phase.h
+maya44.h
+juli.h
+aureon.h
+amp.h
+envy24ht.h
+se.h
+prodigy_hifi.h
+
+
+*I hope this is the correct way to do things.*
+
+
+SAMPLING RATES:
+
+The Maya44 card (or more exactly, the Wolfson WM8776 codecs) allow a maximum sampling rate of 192 kHz for playback and 92 kHz for capture.
+
+As the ICE1724 chip only allows one global sampling rate, this is handled as follows:
+
+* setting the sampling rate on any open PCM device on the maya44 card will always set the *global* sampling rate for all playback and capture channels.
+
+* In the current state of the driver, setting rates of up to 192 kHz is permitted even for capture devices.
+
+*AVOID CAPTURING AT RATES ABOVE 96kHz*, even though it may appear to work. The codec cannot actually capture at such rates, meaning poor quality.
+
+
+I propose some additional code for limiting the sampling rate when setting on a capture pcm device. However because of the global sampling rate, this logic would be somewhat problematic.
+
+The proposed code (currently deactivated) is in ice1712.h.patch, ice1724.c and maya44.c (in pci/ice1712).
+
+
+SOUND DEVICES:
+
+PCM devices correspond to inputs/outputs as follows (assuming Maya44 is card #0):
+
+hw:0,0 input - stereo, analog input 1+2
+hw:0,0 output - stereo, analog output 1+2
+hw:0,1 input - stereo, analog input 3+4 OR S/PDIF input
+hw:0,1 output - stereo, analog output 3+4 (and SPDIF out)
+
+
+NAMING OF MIXER CONTROLS:
+
+(for more information about the signal flow, please refer to the block diagram on p.24 of the ESI Maya44 manual, or in the ESI windows software).
+
+
+PCM: (digital) output level for channel 1+2
+PCM 1: same for channel 3+4
+
+Mic Phantom+48V: switch for +48V phantom power for electrostatic microphones on input 1/2.
+ Make sure this is not turned on while any other source is connected to input 1/2.
+ It might damage the source and/or the maya44 card.
+
+Mic/Line input: if switch is is on, input jack 1/2 is microphone input (mono), otherwise line input (stereo).
+
+Bypass: analogue bypass from ADC input to output for channel 1+2. Same as "Monitor" in the windows driver.
+Bypass 1: same for channel 3+4.
+
+Crossmix: cross-mixer from channels 1+2 to channels 3+4
+Crossmix 1: cross-mixer from channels 3+4 to channels 1+2
+
+IEC958 Output: switch for S/PDIF output.
+ This is not supported by the ESI windows driver.
+ S/PDIF should output the same signal as channel 3+4. [untested!]
+
+
+Digitial output selectors:
+
+ These switches allow a direct digital routing from the ADCs to the DACs.
+ Each switch determines where the digital input data to one of the DACs comes from.
+ They are not supported by the ESI windows driver.
+ For normal operation, they should all be set to "PCM out".
+
+H/W: Output source channel 1
+H/W 1: Output source channel 2
+H/W 2: Output source channel 3
+H/W 3: Output source channel 4
+
+H/W 4 ... H/W 9: unknown function, left in to enable testing.
+ Possibly some of these control S/PDIF output(s).
+ If these turn out to be unused, they will go away in later driver versions.
+
+Selectable values for each of the digital output selectors are:
+ "PCM out" -> DAC output of the corresponding channel (default setting)
+ "Input 1"...
+ "Input 4" -> direct routing from ADC output of the selected input channel
+
+
+--------
+
+Feb 14, 2008
+Rainer Zimmermann
+mail@lightshed.de
+
diff --git a/Documentation/sound/alsa/hda_codec.txt b/Documentation/sound/alsa/hda_codec.txt
index 34e87ec1379..de8efbc7e4b 100644
--- a/Documentation/sound/alsa/hda_codec.txt
+++ b/Documentation/sound/alsa/hda_codec.txt
@@ -114,7 +114,7 @@ For writing a sequence of verbs, use snd_hda_sequence_write().
There are variants of cached read/write, snd_hda_codec_write_cache(),
snd_hda_sequence_write_cache(). These are used for recording the
-register states for the power-mangement resume. When no PM is needed,
+register states for the power-management resume. When no PM is needed,
these are equivalent with non-cached version.
To retrieve the number of sub nodes connected to the given node, use
diff --git a/Documentation/sound/alsa/soc/dapm.txt b/Documentation/sound/alsa/soc/dapm.txt
index 9e6763264a2..9ac842be9b4 100644
--- a/Documentation/sound/alsa/soc/dapm.txt
+++ b/Documentation/sound/alsa/soc/dapm.txt
@@ -62,6 +62,7 @@ Audio DAPM widgets fall into a number of types:-
o Mic - Mic (and optional Jack)
o Line - Line Input/Output (and optional Jack)
o Speaker - Speaker
+ o Supply - Power or clock supply widget used by other widgets.
o Pre - Special PRE widget (exec before all others)
o Post - Special POST widget (exec after all others)
diff --git a/Documentation/spi/spidev_test.c b/Documentation/spi/spidev_test.c
index cf0e3ce0d52..c1a5aad3c75 100644
--- a/Documentation/spi/spidev_test.c
+++ b/Documentation/spi/spidev_test.c
@@ -99,11 +99,13 @@ void parse_opts(int argc, char *argv[])
{ "lsb", 0, 0, 'L' },
{ "cs-high", 0, 0, 'C' },
{ "3wire", 0, 0, '3' },
+ { "no-cs", 0, 0, 'N' },
+ { "ready", 0, 0, 'R' },
{ NULL, 0, 0, 0 },
};
int c;
- c = getopt_long(argc, argv, "D:s:d:b:lHOLC3", lopts, NULL);
+ c = getopt_long(argc, argv, "D:s:d:b:lHOLC3NR", lopts, NULL);
if (c == -1)
break;
@@ -139,6 +141,12 @@ void parse_opts(int argc, char *argv[])
case '3':
mode |= SPI_3WIRE;
break;
+ case 'N':
+ mode |= SPI_NO_CS;
+ break;
+ case 'R':
+ mode |= SPI_READY;
+ break;
default:
print_usage(argv[0]);
break;
diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt
index f11ca7979fa..322a00bb99d 100644
--- a/Documentation/sysctl/kernel.txt
+++ b/Documentation/sysctl/kernel.txt
@@ -32,6 +32,7 @@ show up in /proc/sys/kernel:
- kstack_depth_to_print [ X86 only ]
- l2cr [ PPC only ]
- modprobe ==> Documentation/debugging-modules.txt
+- modules_disabled
- msgmax
- msgmnb
- msgmni
@@ -184,6 +185,16 @@ kernel stack.
==============================================================
+modules_disabled:
+
+A toggle value indicating if modules are allowed to be loaded
+in an otherwise modular kernel. This toggle defaults to off
+(0), but can be set true (1). Once true, modules can be
+neither loaded nor unloaded, and the toggle cannot be set back
+to false.
+
+==============================================================
+
osrelease, ostype & version:
# cat osrelease
diff --git a/Documentation/sysctl/vm.txt b/Documentation/sysctl/vm.txt
index c302ddf629a..c4de6359d44 100644
--- a/Documentation/sysctl/vm.txt
+++ b/Documentation/sysctl/vm.txt
@@ -233,8 +233,8 @@ These protections are added to score to judge whether this zone should be used
for page allocation or should be reclaimed.
In this example, if normal pages (index=2) are required to this DMA zone and
-pages_high is used for watermark, the kernel judges this zone should not be
-used because pages_free(1355) is smaller than watermark + protection[2]
+watermark[WMARK_HIGH] is used for watermark, the kernel judges this zone should
+not be used because pages_free(1355) is smaller than watermark + protection[2]
(4 + 2004 = 2008). If this protection value is 0, this zone would be used for
normal page requirement. If requirement is DMA zone(index=0), protection[0]
(=0) is used.
@@ -280,9 +280,10 @@ The default value is 65536.
min_free_kbytes:
This is used to force the Linux VM to keep a minimum number
-of kilobytes free. The VM uses this number to compute a pages_min
-value for each lowmem zone in the system. Each lowmem zone gets
-a number of reserved free pages based proportionally on its size.
+of kilobytes free. The VM uses this number to compute a
+watermark[WMARK_MIN] value for each lowmem zone in the system.
+Each lowmem zone gets a number of reserved free pages based
+proportionally on its size.
Some minimal amount of memory is needed to satisfy PF_MEMALLOC
allocations; if you set this to lower than 1024KB, your system will
@@ -314,10 +315,14 @@ min_unmapped_ratio:
This is available only on NUMA kernels.
-A percentage of the total pages in each zone. Zone reclaim will only
-occur if more than this percentage of pages are file backed and unmapped.
-This is to insure that a minimal amount of local pages is still available for
-file I/O even if the node is overallocated.
+This is a percentage of the total pages in each zone. Zone reclaim will
+only occur if more than this percentage of pages are in a state that
+zone_reclaim_mode allows to be reclaimed.
+
+If zone_reclaim_mode has the value 4 OR'd, then the percentage is compared
+against all file-backed unmapped pages including swapcache pages and tmpfs
+files. Otherwise, only unmapped pages backed by normal files but not tmpfs
+files and similar are considered.
The default is 1 percent.
@@ -358,7 +363,7 @@ nr_pdflush_threads
The current number of pdflush threads. This value is read-only.
The value changes according to the number of dirty pages in the system.
-When neccessary, additional pdflush threads are created, one per second, up to
+When necessary, additional pdflush threads are created, one per second, up to
nr_pdflush_threads_max.
==============================================================
@@ -565,7 +570,7 @@ swappiness
This control is used to define how aggressive the kernel will swap
memory pages. Higher values will increase agressiveness, lower values
-descrease the amount of swap.
+decrease the amount of swap.
The default value is 60.
diff --git a/Documentation/sysrq.txt b/Documentation/sysrq.txt
index cf42b820ff9..d56a0177542 100644
--- a/Documentation/sysrq.txt
+++ b/Documentation/sysrq.txt
@@ -66,7 +66,8 @@ On all - write a character to /proc/sysrq-trigger. e.g.:
'b' - Will immediately reboot the system without syncing or unmounting
your disks.
-'c' - Will perform a kexec reboot in order to take a crashdump.
+'c' - Will perform a system crash by a NULL pointer dereference.
+ A crashdump will be taken if configured.
'd' - Shows all locks that are held.
@@ -141,8 +142,8 @@ useful when you want to exit a program that will not let you switch consoles.
re'B'oot is good when you're unable to shut down. But you should also 'S'ync
and 'U'mount first.
-'C'rashdump can be used to manually trigger a crashdump when the system is hung.
-The kernel needs to have been built with CONFIG_KEXEC enabled.
+'C'rash can be used to manually trigger a crashdump when the system is hung.
+Note that this just triggers a crash if there is no dump mechanism available.
'S'ync is great when your system is locked up, it allows you to sync your
disks and will certainly lessen the chance of data loss and fscking. Note
diff --git a/Documentation/timers/hpet.txt b/Documentation/timers/hpet.txt
index e7c09abcfab..04763a32552 100644
--- a/Documentation/timers/hpet.txt
+++ b/Documentation/timers/hpet.txt
@@ -7,7 +7,7 @@ by Intel and Microsoft which can be found at
Each HPET has one fixed-rate counter (at 10+ MHz, hence "High Precision")
and up to 32 comparators. Normally three or more comparators are provided,
-each of which can generate oneshot interupts and at least one of which has
+each of which can generate oneshot interrupts and at least one of which has
additional hardware to support periodic interrupts. The comparators are
also called "timers", which can be misleading since usually timers are
independent of each other ... these share a counter, complicating resets.
diff --git a/Documentation/timers/timer_stats.txt b/Documentation/timers/timer_stats.txt
index 20d368c5981..9bd00fc2e82 100644
--- a/Documentation/timers/timer_stats.txt
+++ b/Documentation/timers/timer_stats.txt
@@ -62,7 +62,7 @@ Timerstats sample period: 3.888770 s
The first column is the number of events, the second column the pid, the third
column is the name of the process. The forth column shows the function which
-initialized the timer and in parantheses the callback function which was
+initialized the timer and in parenthesis the callback function which was
executed on expiry.
Thomas, Ingo
diff --git a/Documentation/trace/events.txt b/Documentation/trace/events.txt
new file mode 100644
index 00000000000..f157d7594ea
--- /dev/null
+++ b/Documentation/trace/events.txt
@@ -0,0 +1,90 @@
+ Event Tracing
+
+ Documentation written by Theodore Ts'o
+ Updated by Li Zefan
+
+1. Introduction
+===============
+
+Tracepoints (see Documentation/trace/tracepoints.txt) can be used
+without creating custom kernel modules to register probe functions
+using the event tracing infrastructure.
+
+Not all tracepoints can be traced using the event tracing system;
+the kernel developer must provide code snippets which define how the
+tracing information is saved into the tracing buffer, and how the
+tracing information should be printed.
+
+2. Using Event Tracing
+======================
+
+2.1 Via the 'set_event' interface
+---------------------------------
+
+The events which are available for tracing can be found in the file
+/debug/tracing/available_events.
+
+To enable a particular event, such as 'sched_wakeup', simply echo it
+to /debug/tracing/set_event. For example:
+
+ # echo sched_wakeup >> /debug/tracing/set_event
+
+[ Note: '>>' is necessary, otherwise it will firstly disable
+ all the events. ]
+
+To disable an event, echo the event name to the set_event file prefixed
+with an exclamation point:
+
+ # echo '!sched_wakeup' >> /debug/tracing/set_event
+
+To disable all events, echo an empty line to the set_event file:
+
+ # echo > /debug/tracing/set_event
+
+To enable all events, echo '*:*' or '*:' to the set_event file:
+
+ # echo *:* > /debug/tracing/set_event
+
+The events are organized into subsystems, such as ext4, irq, sched,
+etc., and a full event name looks like this: <subsystem>:<event>. The
+subsystem name is optional, but it is displayed in the available_events
+file. All of the events in a subsystem can be specified via the syntax
+"<subsystem>:*"; for example, to enable all irq events, you can use the
+command:
+
+ # echo 'irq:*' > /debug/tracing/set_event
+
+2.2 Via the 'enable' toggle
+---------------------------
+
+The events available are also listed in /debug/tracing/events/ hierarchy
+of directories.
+
+To enable event 'sched_wakeup':
+
+ # echo 1 > /debug/tracing/events/sched/sched_wakeup/enable
+
+To disable it:
+
+ # echo 0 > /debug/tracing/events/sched/sched_wakeup/enable
+
+To enable all events in sched subsystem:
+
+ # echo 1 > /debug/tracing/events/sched/enable
+
+To eanble all events:
+
+ # echo 1 > /debug/tracing/events/enable
+
+When reading one of these enable files, there are four results:
+
+ 0 - all events this file affects are disabled
+ 1 - all events this file affects are enabled
+ X - there is a mixture of events enabled and disabled
+ ? - this file does not affect any event
+
+3. Defining an event-enabled tracepoint
+=======================================
+
+See The example provided in samples/trace_events
+
diff --git a/Documentation/trace/ftrace.txt b/Documentation/trace/ftrace.txt
index fd9a3e69381..a39b3c749de 100644
--- a/Documentation/trace/ftrace.txt
+++ b/Documentation/trace/ftrace.txt
@@ -7,7 +7,6 @@ Copyright 2008 Red Hat Inc.
(dual licensed under the GPL v2)
Reviewers: Elias Oltmanns, Randy Dunlap, Andrew Morton,
John Kacur, and David Teigland.
-
Written for: 2.6.28-rc2
Introduction
@@ -33,13 +32,26 @@ The File System
Ftrace uses the debugfs file system to hold the control files as
well as the files to display output.
-To mount the debugfs system:
+When debugfs is configured into the kernel (which selecting any ftrace
+option will do) the directory /sys/kernel/debug will be created. To mount
+this directory, you can add to your /etc/fstab file:
+
+ debugfs /sys/kernel/debug debugfs defaults 0 0
+
+Or you can mount it at run time with:
+
+ mount -t debugfs nodev /sys/kernel/debug
- # mkdir /debug
- # mount -t debugfs nodev /debug
+For quicker access to that directory you may want to make a soft link to
+it:
-( Note: it is more common to mount at /sys/kernel/debug, but for
- simplicity this document will use /debug)
+ ln -s /sys/kernel/debug /debug
+
+Any selected ftrace option will also create a directory called tracing
+within the debugfs. The rest of the document will assume that you are in
+the ftrace directory (cd /sys/kernel/debug/tracing) and will only concentrate
+on the files within that directory and not distract from the content with
+the extended "/sys/kernel/debug/tracing" path name.
That's it! (assuming that you have ftrace configured into your kernel)
@@ -179,7 +191,7 @@ Here is the list of current tracers that may be configured.
Function call tracer to trace all kernel functions.
- "function_graph_tracer"
+ "function_graph"
Similar to the function tracer except that the
function tracer probes the functions on their entry
@@ -389,18 +401,18 @@ trace_options
The trace_options file is used to control what gets printed in
the trace output. To see what is available, simply cat the file:
- cat /debug/tracing/trace_options
+ cat trace_options
print-parent nosym-offset nosym-addr noverbose noraw nohex nobin \
noblock nostacktrace nosched-tree nouserstacktrace nosym-userobj
To disable one of the options, echo in the option prepended with
"no".
- echo noprint-parent > /debug/tracing/trace_options
+ echo noprint-parent > trace_options
To enable an option, leave off the "no".
- echo sym-offset > /debug/tracing/trace_options
+ echo sym-offset > trace_options
Here are the available options:
@@ -476,11 +488,11 @@ sched_switch
This tracer simply records schedule switches. Here is an example
of how to use it.
- # echo sched_switch > /debug/tracing/current_tracer
- # echo 1 > /debug/tracing/tracing_enabled
+ # echo sched_switch > current_tracer
+ # echo 1 > tracing_enabled
# sleep 1
- # echo 0 > /debug/tracing/tracing_enabled
- # cat /debug/tracing/trace
+ # echo 0 > tracing_enabled
+ # cat trace
# tracer: sched_switch
#
@@ -518,9 +530,18 @@ priority with zero (0) being the highest priority and the nice
values starting at 100 (nice -20). Below is a quick chart to map
the kernel priority to user land priorities.
- Kernel priority: 0 to 99 ==> user RT priority 99 to 0
- Kernel priority: 100 to 139 ==> user nice -20 to 19
- Kernel priority: 140 ==> idle task priority
+ Kernel Space User Space
+ ===============================================================
+ 0(high) to 98(low) user RT priority 99(high) to 1(low)
+ with SCHED_RR or SCHED_FIFO
+ ---------------------------------------------------------------
+ 99 sched_priority is not used in scheduling
+ decisions(it must be specified as 0)
+ ---------------------------------------------------------------
+ 100(high) to 139(low) user nice -20(high) to 19(low)
+ ---------------------------------------------------------------
+ 140 idle task priority
+ ---------------------------------------------------------------
The task states are:
@@ -574,13 +595,13 @@ new trace is saved.
To reset the maximum, echo 0 into tracing_max_latency. Here is
an example:
- # echo irqsoff > /debug/tracing/current_tracer
- # echo 0 > /debug/tracing/tracing_max_latency
- # echo 1 > /debug/tracing/tracing_enabled
+ # echo irqsoff > current_tracer
+ # echo 0 > tracing_max_latency
+ # echo 1 > tracing_enabled
# ls -ltr
[...]
- # echo 0 > /debug/tracing/tracing_enabled
- # cat /debug/tracing/latency_trace
+ # echo 0 > tracing_enabled
+ # cat latency_trace
# tracer: irqsoff
#
irqsoff latency trace v1.1.5 on 2.6.26
@@ -681,13 +702,13 @@ Like the irqsoff tracer, it records the maximum latency for
which preemption was disabled. The control of preemptoff tracer
is much like the irqsoff tracer.
- # echo preemptoff > /debug/tracing/current_tracer
- # echo 0 > /debug/tracing/tracing_max_latency
- # echo 1 > /debug/tracing/tracing_enabled
+ # echo preemptoff > current_tracer
+ # echo 0 > tracing_max_latency
+ # echo 1 > tracing_enabled
# ls -ltr
[...]
- # echo 0 > /debug/tracing/tracing_enabled
- # cat /debug/tracing/latency_trace
+ # echo 0 > tracing_enabled
+ # cat latency_trace
# tracer: preemptoff
#
preemptoff latency trace v1.1.5 on 2.6.26-rc8
@@ -828,13 +849,13 @@ tracer.
Again, using this trace is much like the irqsoff and preemptoff
tracers.
- # echo preemptirqsoff > /debug/tracing/current_tracer
- # echo 0 > /debug/tracing/tracing_max_latency
- # echo 1 > /debug/tracing/tracing_enabled
+ # echo preemptirqsoff > current_tracer
+ # echo 0 > tracing_max_latency
+ # echo 1 > tracing_enabled
# ls -ltr
[...]
- # echo 0 > /debug/tracing/tracing_enabled
- # cat /debug/tracing/latency_trace
+ # echo 0 > tracing_enabled
+ # cat latency_trace
# tracer: preemptirqsoff
#
preemptirqsoff latency trace v1.1.5 on 2.6.26-rc8
@@ -990,12 +1011,12 @@ slightly differently than we did with the previous tracers.
Instead of performing an 'ls', we will run 'sleep 1' under
'chrt' which changes the priority of the task.
- # echo wakeup > /debug/tracing/current_tracer
- # echo 0 > /debug/tracing/tracing_max_latency
- # echo 1 > /debug/tracing/tracing_enabled
+ # echo wakeup > current_tracer
+ # echo 0 > tracing_max_latency
+ # echo 1 > tracing_enabled
# chrt -f 5 sleep 1
- # echo 0 > /debug/tracing/tracing_enabled
- # cat /debug/tracing/latency_trace
+ # echo 0 > tracing_enabled
+ # cat latency_trace
# tracer: wakeup
#
wakeup latency trace v1.1.5 on 2.6.26-rc8
@@ -1105,11 +1126,11 @@ can be done from the debug file system. Make sure the
ftrace_enabled is set; otherwise this tracer is a nop.
# sysctl kernel.ftrace_enabled=1
- # echo function > /debug/tracing/current_tracer
- # echo 1 > /debug/tracing/tracing_enabled
+ # echo function > current_tracer
+ # echo 1 > tracing_enabled
# usleep 1
- # echo 0 > /debug/tracing/tracing_enabled
- # cat /debug/tracing/trace
+ # echo 0 > tracing_enabled
+ # cat trace
# tracer: function
#
# TASK-PID CPU# TIMESTAMP FUNCTION
@@ -1146,7 +1167,7 @@ int trace_fd;
[...]
int main(int argc, char *argv[]) {
[...]
- trace_fd = open("/debug/tracing/tracing_enabled", O_WRONLY);
+ trace_fd = open(tracing_file("tracing_enabled"), O_WRONLY);
[...]
if (condition_hit()) {
write(trace_fd, "0", 1);
@@ -1154,26 +1175,20 @@ int main(int argc, char *argv[]) {
[...]
}
-Note: Here we hard coded the path name. The debugfs mount is not
-guaranteed to be at /debug (and is more commonly at
-/sys/kernel/debug). For simple one time traces, the above is
-sufficent. For anything else, a search through /proc/mounts may
-be needed to find where the debugfs file-system is mounted.
-
Single thread tracing
---------------------
-By writing into /debug/tracing/set_ftrace_pid you can trace a
+By writing into set_ftrace_pid you can trace a
single thread. For example:
-# cat /debug/tracing/set_ftrace_pid
+# cat set_ftrace_pid
no pid
-# echo 3111 > /debug/tracing/set_ftrace_pid
-# cat /debug/tracing/set_ftrace_pid
+# echo 3111 > set_ftrace_pid
+# cat set_ftrace_pid
3111
-# echo function > /debug/tracing/current_tracer
-# cat /debug/tracing/trace | head
+# echo function > current_tracer
+# cat trace | head
# tracer: function
#
# TASK-PID CPU# TIMESTAMP FUNCTION
@@ -1184,8 +1199,8 @@ no pid
yum-updatesd-3111 [003] 1637.254683: lock_hrtimer_base <-hrtimer_try_to_cancel
yum-updatesd-3111 [003] 1637.254685: fget_light <-do_sys_poll
yum-updatesd-3111 [003] 1637.254686: pipe_poll <-do_sys_poll
-# echo -1 > /debug/tracing/set_ftrace_pid
-# cat /debug/tracing/trace |head
+# echo -1 > set_ftrace_pid
+# cat trace |head
# tracer: function
#
# TASK-PID CPU# TIMESTAMP FUNCTION
@@ -1207,6 +1222,51 @@ something like this simple program:
#include <fcntl.h>
#include <unistd.h>
+#define _STR(x) #x
+#define STR(x) _STR(x)
+#define MAX_PATH 256
+
+const char *find_debugfs(void)
+{
+ static char debugfs[MAX_PATH+1];
+ static int debugfs_found;
+ char type[100];
+ FILE *fp;
+
+ if (debugfs_found)
+ return debugfs;
+
+ if ((fp = fopen("/proc/mounts","r")) == NULL) {
+ perror("/proc/mounts");
+ return NULL;
+ }
+
+ while (fscanf(fp, "%*s %"
+ STR(MAX_PATH)
+ "s %99s %*s %*d %*d\n",
+ debugfs, type) == 2) {
+ if (strcmp(type, "debugfs") == 0)
+ break;
+ }
+ fclose(fp);
+
+ if (strcmp(type, "debugfs") != 0) {
+ fprintf(stderr, "debugfs not mounted");
+ return NULL;
+ }
+
+ debugfs_found = 1;
+
+ return debugfs;
+}
+
+const char *tracing_file(const char *file_name)
+{
+ static char trace_file[MAX_PATH+1];
+ snprintf(trace_file, MAX_PATH, "%s/%s", find_debugfs(), file_name);
+ return trace_file;
+}
+
int main (int argc, char **argv)
{
if (argc < 1)
@@ -1217,12 +1277,12 @@ int main (int argc, char **argv)
char line[64];
int s;
- ffd = open("/debug/tracing/current_tracer", O_WRONLY);
+ ffd = open(tracing_file("current_tracer"), O_WRONLY);
if (ffd < 0)
exit(-1);
write(ffd, "nop", 3);
- fd = open("/debug/tracing/set_ftrace_pid", O_WRONLY);
+ fd = open(tracing_file("set_ftrace_pid"), O_WRONLY);
s = sprintf(line, "%d\n", getpid());
write(fd, line, s);
@@ -1374,22 +1434,22 @@ want, depending on your needs.
tracing_cpu_mask file) or you might sometimes see unordered
function calls while cpu tracing switch.
- hide: echo nofuncgraph-cpu > /debug/tracing/trace_options
- show: echo funcgraph-cpu > /debug/tracing/trace_options
+ hide: echo nofuncgraph-cpu > trace_options
+ show: echo funcgraph-cpu > trace_options
- The duration (function's time of execution) is displayed on
the closing bracket line of a function or on the same line
than the current function in case of a leaf one. It is default
enabled.
- hide: echo nofuncgraph-duration > /debug/tracing/trace_options
- show: echo funcgraph-duration > /debug/tracing/trace_options
+ hide: echo nofuncgraph-duration > trace_options
+ show: echo funcgraph-duration > trace_options
- The overhead field precedes the duration field in case of
reached duration thresholds.
- hide: echo nofuncgraph-overhead > /debug/tracing/trace_options
- show: echo funcgraph-overhead > /debug/tracing/trace_options
+ hide: echo nofuncgraph-overhead > trace_options
+ show: echo funcgraph-overhead > trace_options
depends on: funcgraph-duration
ie:
@@ -1418,8 +1478,8 @@ want, depending on your needs.
- The task/pid field displays the thread cmdline and pid which
executed the function. It is default disabled.
- hide: echo nofuncgraph-proc > /debug/tracing/trace_options
- show: echo funcgraph-proc > /debug/tracing/trace_options
+ hide: echo nofuncgraph-proc > trace_options
+ show: echo funcgraph-proc > trace_options
ie:
@@ -1442,8 +1502,8 @@ want, depending on your needs.
system clock since it started. A snapshot of this time is
given on each entry/exit of functions
- hide: echo nofuncgraph-abstime > /debug/tracing/trace_options
- show: echo funcgraph-abstime > /debug/tracing/trace_options
+ hide: echo nofuncgraph-abstime > trace_options
+ show: echo funcgraph-abstime > trace_options
ie:
@@ -1540,7 +1600,7 @@ listed in:
available_filter_functions
- # cat /debug/tracing/available_filter_functions
+ # cat available_filter_functions
put_prev_task_idle
kmem_cache_create
pick_next_task_rt
@@ -1552,12 +1612,12 @@ mutex_lock
If I am only interested in sys_nanosleep and hrtimer_interrupt:
# echo sys_nanosleep hrtimer_interrupt \
- > /debug/tracing/set_ftrace_filter
- # echo ftrace > /debug/tracing/current_tracer
- # echo 1 > /debug/tracing/tracing_enabled
+ > set_ftrace_filter
+ # echo ftrace > current_tracer
+ # echo 1 > tracing_enabled
# usleep 1
- # echo 0 > /debug/tracing/tracing_enabled
- # cat /debug/tracing/trace
+ # echo 0 > tracing_enabled
+ # cat trace
# tracer: ftrace
#
# TASK-PID CPU# TIMESTAMP FUNCTION
@@ -1568,7 +1628,7 @@ If I am only interested in sys_nanosleep and hrtimer_interrupt:
To see which functions are being traced, you can cat the file:
- # cat /debug/tracing/set_ftrace_filter
+ # cat set_ftrace_filter
hrtimer_interrupt
sys_nanosleep
@@ -1588,7 +1648,7 @@ Note: It is better to use quotes to enclose the wild cards,
otherwise the shell may expand the parameters into names
of files in the local directory.
- # echo 'hrtimer_*' > /debug/tracing/set_ftrace_filter
+ # echo 'hrtimer_*' > set_ftrace_filter
Produces:
@@ -1609,7 +1669,7 @@ Produces:
Notice that we lost the sys_nanosleep.
- # cat /debug/tracing/set_ftrace_filter
+ # cat set_ftrace_filter
hrtimer_run_queues
hrtimer_run_pending
hrtimer_init
@@ -1635,17 +1695,17 @@ To append to the filters, use '>>'
To clear out a filter so that all functions will be recorded
again:
- # echo > /debug/tracing/set_ftrace_filter
- # cat /debug/tracing/set_ftrace_filter
+ # echo > set_ftrace_filter
+ # cat set_ftrace_filter
#
Again, now we want to append.
- # echo sys_nanosleep > /debug/tracing/set_ftrace_filter
- # cat /debug/tracing/set_ftrace_filter
+ # echo sys_nanosleep > set_ftrace_filter
+ # cat set_ftrace_filter
sys_nanosleep
- # echo 'hrtimer_*' >> /debug/tracing/set_ftrace_filter
- # cat /debug/tracing/set_ftrace_filter
+ # echo 'hrtimer_*' >> set_ftrace_filter
+ # cat set_ftrace_filter
hrtimer_run_queues
hrtimer_run_pending
hrtimer_init
@@ -1668,7 +1728,7 @@ hrtimer_init_sleeper
The set_ftrace_notrace prevents those functions from being
traced.
- # echo '*preempt*' '*lock*' > /debug/tracing/set_ftrace_notrace
+ # echo '*preempt*' '*lock*' > set_ftrace_notrace
Produces:
@@ -1758,13 +1818,13 @@ the effect on the tracing is different. Every read from
trace_pipe is consumed. This means that subsequent reads will be
different. The trace is live.
- # echo function > /debug/tracing/current_tracer
- # cat /debug/tracing/trace_pipe > /tmp/trace.out &
+ # echo function > current_tracer
+ # cat trace_pipe > /tmp/trace.out &
[1] 4153
- # echo 1 > /debug/tracing/tracing_enabled
+ # echo 1 > tracing_enabled
# usleep 1
- # echo 0 > /debug/tracing/tracing_enabled
- # cat /debug/tracing/trace
+ # echo 0 > tracing_enabled
+ # cat trace
# tracer: function
#
# TASK-PID CPU# TIMESTAMP FUNCTION
@@ -1800,7 +1860,7 @@ number listed is the number of entries that can be recorded per
CPU. To know the full size, multiply the number of possible CPUS
with the number of entries.
- # cat /debug/tracing/buffer_size_kb
+ # cat buffer_size_kb
1408 (units kilobytes)
Note, to modify this, you must have tracing completely disabled.
@@ -1808,21 +1868,21 @@ To do that, echo "nop" into the current_tracer. If the
current_tracer is not set to "nop", an EINVAL error will be
returned.
- # echo nop > /debug/tracing/current_tracer
- # echo 10000 > /debug/tracing/buffer_size_kb
- # cat /debug/tracing/buffer_size_kb
+ # echo nop > current_tracer
+ # echo 10000 > buffer_size_kb
+ # cat buffer_size_kb
10000 (units kilobytes)
The number of pages which will be allocated is limited to a
percentage of available memory. Allocating too much will produce
an error.
- # echo 1000000000000 > /debug/tracing/buffer_size_kb
+ # echo 1000000000000 > buffer_size_kb
-bash: echo: write error: Cannot allocate memory
- # cat /debug/tracing/buffer_size_kb
+ # cat buffer_size_kb
85
-----------
More details can be found in the source code, in the
-kernel/tracing/*.c files.
+kernel/trace/*.c files.
diff --git a/Documentation/trace/kmemtrace.txt b/Documentation/trace/kmemtrace.txt
index a956d9b7f94..6308735e58c 100644
--- a/Documentation/trace/kmemtrace.txt
+++ b/Documentation/trace/kmemtrace.txt
@@ -64,7 +64,7 @@ III. Quick usage guide
CONFIG_KMEMTRACE).
2) Get the userspace tool and build it:
-$ git-clone git://repo.or.cz/kmemtrace-user.git # current repository
+$ git clone git://repo.or.cz/kmemtrace-user.git # current repository
$ cd kmemtrace-user/
$ ./autogen.sh
$ ./configure
diff --git a/Documentation/trace/mmiotrace.txt b/Documentation/trace/mmiotrace.txt
index 5731c67abc5..162effbfbde 100644
--- a/Documentation/trace/mmiotrace.txt
+++ b/Documentation/trace/mmiotrace.txt
@@ -32,41 +32,41 @@ is no way to automatically detect if you are losing events due to CPUs racing.
Usage Quick Reference
---------------------
-$ mount -t debugfs debugfs /debug
-$ echo mmiotrace > /debug/tracing/current_tracer
-$ cat /debug/tracing/trace_pipe > mydump.txt &
+$ mount -t debugfs debugfs /sys/kernel/debug
+$ echo mmiotrace > /sys/kernel/debug/tracing/current_tracer
+$ cat /sys/kernel/debug/tracing/trace_pipe > mydump.txt &
Start X or whatever.
-$ echo "X is up" > /debug/tracing/trace_marker
-$ echo nop > /debug/tracing/current_tracer
+$ echo "X is up" > /sys/kernel/debug/tracing/trace_marker
+$ echo nop > /sys/kernel/debug/tracing/current_tracer
Check for lost events.
Usage
-----
-Make sure debugfs is mounted to /debug. If not, (requires root privileges)
-$ mount -t debugfs debugfs /debug
+Make sure debugfs is mounted to /sys/kernel/debug. If not, (requires root privileges)
+$ mount -t debugfs debugfs /sys/kernel/debug
Check that the driver you are about to trace is not loaded.
Activate mmiotrace (requires root privileges):
-$ echo mmiotrace > /debug/tracing/current_tracer
+$ echo mmiotrace > /sys/kernel/debug/tracing/current_tracer
Start storing the trace:
-$ cat /debug/tracing/trace_pipe > mydump.txt &
+$ cat /sys/kernel/debug/tracing/trace_pipe > mydump.txt &
The 'cat' process should stay running (sleeping) in the background.
Load the driver you want to trace and use it. Mmiotrace will only catch MMIO
accesses to areas that are ioremapped while mmiotrace is active.
During tracing you can place comments (markers) into the trace by
-$ echo "X is up" > /debug/tracing/trace_marker
+$ echo "X is up" > /sys/kernel/debug/tracing/trace_marker
This makes it easier to see which part of the (huge) trace corresponds to
which action. It is recommended to place descriptive markers about what you
do.
Shut down mmiotrace (requires root privileges):
-$ echo nop > /debug/tracing/current_tracer
+$ echo nop > /sys/kernel/debug/tracing/current_tracer
The 'cat' process exits. If it does not, kill it by issuing 'fg' command and
pressing ctrl+c.
@@ -78,10 +78,10 @@ to view your kernel log and look for "mmiotrace has lost events" warning. If
events were lost, the trace is incomplete. You should enlarge the buffers and
try again. Buffers are enlarged by first seeing how large the current buffers
are:
-$ cat /debug/tracing/buffer_size_kb
+$ cat /sys/kernel/debug/tracing/buffer_size_kb
gives you a number. Approximately double this number and write it back, for
instance:
-$ echo 128000 > /debug/tracing/buffer_size_kb
+$ echo 128000 > /sys/kernel/debug/tracing/buffer_size_kb
Then start again from the top.
If you are doing a trace for a driver project, e.g. Nouveau, you should also
diff --git a/Documentation/trace/power.txt b/Documentation/trace/power.txt
new file mode 100644
index 00000000000..cd805e16dc2
--- /dev/null
+++ b/Documentation/trace/power.txt
@@ -0,0 +1,17 @@
+The power tracer collects detailed information about C-state and P-state
+transitions, instead of just looking at the high-level "average"
+information.
+
+There is a helper script found in scrips/tracing/power.pl in the kernel
+sources which can be used to parse this information and create a
+Scalable Vector Graphics (SVG) picture from the trace data.
+
+To use this tracer:
+
+ echo 0 > /sys/kernel/debug/tracing/tracing_enabled
+ echo power > /sys/kernel/debug/tracing/current_tracer
+ echo 1 > /sys/kernel/debug/tracing/tracing_enabled
+ sleep 1
+ echo 0 > /sys/kernel/debug/tracing/tracing_enabled
+ cat /sys/kernel/debug/tracing/trace | \
+ perl scripts/tracing/power.pl > out.sv
diff --git a/Documentation/usb/WUSB-Design-overview.txt b/Documentation/usb/WUSB-Design-overview.txt
index 4c3d62c7843..c480e9c32db 100644
--- a/Documentation/usb/WUSB-Design-overview.txt
+++ b/Documentation/usb/WUSB-Design-overview.txt
@@ -84,7 +84,7 @@ The different logical parts of this driver are:
*UWB*: the Ultra-Wide-Band stack -- manages the radio and
associated spectrum to allow for devices sharing it. Allows to
- control bandwidth assingment, beaconing, scanning, etc
+ control bandwidth assignment, beaconing, scanning, etc
*
@@ -184,7 +184,7 @@ and sends the replies and notifications back to the API
[/uwb_rc_neh_grok()/]. Notifications are handled to the UWB daemon, that
is chartered, among other things, to keep the tab of how the UWB radio
neighborhood looks, creating and destroying devices as they show up or
-dissapear.
+disappear.
Command execution is very simple: a command block is sent and a event
block or reply is expected back. For sending/receiving command/events, a
@@ -333,7 +333,7 @@ read descriptors and move our data.
*Device life cycle and keep alives*
-Everytime there is a succesful transfer to/from a device, we update a
+Every time there is a successful transfer to/from a device, we update a
per-device activity timestamp. If not, every now and then we check and
if the activity timestamp gets old, we ping the device by sending it a
Keep Alive IE; it responds with a /DN_Alive/ pong during the DNTS (this
@@ -411,7 +411,7 @@ context (wa_xfer) and submit it. When the xfer is done, our callback is
called and we assign the status bits and release the xfer resources.
In dequeue() we are basically cancelling/aborting the transfer. We issue
-a xfer abort request to the HC, cancell all the URBs we had submitted
+a xfer abort request to the HC, cancel all the URBs we had submitted
and not yet done and when all that is done, the xfer callback will be
called--this will call the URB callback.
diff --git a/Documentation/usb/anchors.txt b/Documentation/usb/anchors.txt
index 6f24f566955..fe6a99a32bb 100644
--- a/Documentation/usb/anchors.txt
+++ b/Documentation/usb/anchors.txt
@@ -27,7 +27,7 @@ Association and disassociation of URBs with anchors
An association of URBs to an anchor is made by an explicit
call to usb_anchor_urb(). The association is maintained until
-an URB is finished by (successfull) completion. Thus disassociation
+an URB is finished by (successful) completion. Thus disassociation
is automatic. A function is provided to forcibly finish (kill)
all URBs associated with an anchor.
Furthermore, disassociation can be made with usb_unanchor_urb()
@@ -76,4 +76,4 @@ usb_get_from_anchor()
Returns the oldest anchored URB of an anchor. The URB is unanchored
and returned with a reference. As you may mix URBs to several
destinations in one anchor you have no guarantee the chronologically
-first submitted URB is returned. \ No newline at end of file
+first submitted URB is returned.
diff --git a/Documentation/usb/callbacks.txt b/Documentation/usb/callbacks.txt
index 7c812411945..bfb36b34b79 100644
--- a/Documentation/usb/callbacks.txt
+++ b/Documentation/usb/callbacks.txt
@@ -65,7 +65,7 @@ Accept or decline an interface. If you accept the device return 0,
otherwise -ENODEV or -ENXIO. Other error codes should be used only if a
genuine error occurred during initialisation which prevented a driver
from accepting a device that would else have been accepted.
-You are strongly encouraged to use usbcore'sfacility,
+You are strongly encouraged to use usbcore's facility,
usb_set_intfdata(), to associate a data structure with an interface, so
that you know which internal state and identity you associate with a
particular interface. The device will not be suspended and you may do IO
diff --git a/Documentation/video4linux/CARDLIST.cx23885 b/Documentation/video4linux/CARDLIST.cx23885
index 91aa3c0f0dd..450b8f8c389 100644
--- a/Documentation/video4linux/CARDLIST.cx23885
+++ b/Documentation/video4linux/CARDLIST.cx23885
@@ -16,3 +16,8 @@
15 -> TeVii S470 [d470:9022]
16 -> DVBWorld DVB-S2 2005 [0001:2005]
17 -> NetUP Dual DVB-S2 CI [1b55:2a2c]
+ 18 -> Hauppauge WinTV-HVR1270 [0070:2211]
+ 19 -> Hauppauge WinTV-HVR1275 [0070:2215]
+ 20 -> Hauppauge WinTV-HVR1255 [0070:2251]
+ 21 -> Hauppauge WinTV-HVR1210 [0070:2291,0070:2295]
+ 22 -> Mygica X8506 DMB-TH [14f1:8651]
diff --git a/Documentation/video4linux/CARDLIST.cx88 b/Documentation/video4linux/CARDLIST.cx88
index 71e9db0b26f..0736518b2f8 100644
--- a/Documentation/video4linux/CARDLIST.cx88
+++ b/Documentation/video4linux/CARDLIST.cx88
@@ -6,8 +6,8 @@
5 -> Leadtek Winfast 2000XP Expert [107d:6611,107d:6613]
6 -> AverTV Studio 303 (M126) [1461:000b]
7 -> MSI TV-@nywhere Master [1462:8606]
- 8 -> Leadtek Winfast DV2000 [107d:6620]
- 9 -> Leadtek PVR 2000 [107d:663b,107d:663c,107d:6632]
+ 8 -> Leadtek Winfast DV2000 [107d:6620,107d:6621]
+ 9 -> Leadtek PVR 2000 [107d:663b,107d:663c,107d:6632,107d:6630,107d:6638,107d:6631,107d:6637,107d:663d]
10 -> IODATA GV-VCP3/PCI [10fc:d003]
11 -> Prolink PlayTV PVR
12 -> ASUS PVR-416 [1043:4823,1461:c111]
@@ -59,7 +59,7 @@
58 -> Pinnacle PCTV HD 800i [11bd:0051]
59 -> DViCO FusionHDTV 5 PCI nano [18ac:d530]
60 -> Pinnacle Hybrid PCTV [12ab:1788]
- 61 -> Winfast TV2000 XP Global [107d:6f18]
+ 61 -> Leadtek TV2000 XP Global [107d:6f18,107d:6618]
62 -> PowerColor RA330 [14f1:ea3d]
63 -> Geniatech X8000-MT DVBT [14f1:8852]
64 -> DViCO FusionHDTV DVB-T PRO [18ac:db30]
@@ -78,3 +78,5 @@
77 -> TBS 8910 DVB-S [8910:8888]
78 -> Prof 6200 DVB-S [b022:3022]
79 -> Terratec Cinergy HT PCI MKII [153b:1177]
+ 80 -> Hauppauge WinTV-IR Only [0070:9290]
+ 81 -> Leadtek WinFast DTV1800 Hybrid [107d:6654]
diff --git a/Documentation/video4linux/CARDLIST.em28xx b/Documentation/video4linux/CARDLIST.em28xx
index 78d0a6eed57..e352d754875 100644
--- a/Documentation/video4linux/CARDLIST.em28xx
+++ b/Documentation/video4linux/CARDLIST.em28xx
@@ -1,5 +1,5 @@
0 -> Unknown EM2800 video grabber (em2800) [eb1a:2800]
- 1 -> Unknown EM2750/28xx video grabber (em2820/em2840) [eb1a:2820,eb1a:2821,eb1a:2860,eb1a:2861,eb1a:2870,eb1a:2881,eb1a:2883]
+ 1 -> Unknown EM2750/28xx video grabber (em2820/em2840) [eb1a:2710,eb1a:2820,eb1a:2821,eb1a:2860,eb1a:2861,eb1a:2870,eb1a:2881,eb1a:2883]
2 -> Terratec Cinergy 250 USB (em2820/em2840) [0ccd:0036]
3 -> Pinnacle PCTV USB 2 (em2820/em2840) [2304:0208]
4 -> Hauppauge WinTV USB 2 (em2820/em2840) [2040:4200,2040:4201]
@@ -17,10 +17,10 @@
16 -> Hauppauge WinTV HVR 950 (em2883) [2040:6513,2040:6517,2040:651b]
17 -> Pinnacle PCTV HD Pro Stick (em2880) [2304:0227]
18 -> Hauppauge WinTV HVR 900 (R2) (em2880) [2040:6502]
- 19 -> PointNix Intra-Oral Camera (em2860)
+ 19 -> EM2860/SAA711X Reference Design (em2860)
20 -> AMD ATI TV Wonder HD 600 (em2880) [0438:b002]
21 -> eMPIA Technology, Inc. GrabBeeX+ Video Encoder (em2800) [eb1a:2801]
- 22 -> Unknown EM2750/EM2751 webcam grabber (em2750) [eb1a:2750,eb1a:2751]
+ 22 -> EM2710/EM2750/EM2751 webcam grabber (em2750) [eb1a:2750,eb1a:2751]
23 -> Huaqi DLCW-130 (em2750)
24 -> D-Link DUB-T210 TV Tuner (em2820/em2840) [2001:f112]
25 -> Gadmei UTV310 (em2820/em2840)
@@ -61,3 +61,9 @@
63 -> Kaiomy TVnPC U2 (em2860) [eb1a:e303]
64 -> Easy Cap Capture DC-60 (em2860)
65 -> IO-DATA GV-MVP/SZ (em2820/em2840) [04bb:0515]
+ 66 -> Empire dual TV (em2880)
+ 67 -> Terratec Grabby (em2860) [0ccd:0096]
+ 68 -> Terratec AV350 (em2860) [0ccd:0084]
+ 69 -> KWorld ATSC 315U HDTV TV Box (em2882) [eb1a:a313]
+ 70 -> Evga inDtube (em2882)
+ 71 -> Silvercrest Webcam 1.3mpix (em2820/em2840)
diff --git a/Documentation/video4linux/CARDLIST.saa7134 b/Documentation/video4linux/CARDLIST.saa7134
index 6dacf282525..c913e561419 100644
--- a/Documentation/video4linux/CARDLIST.saa7134
+++ b/Documentation/video4linux/CARDLIST.saa7134
@@ -124,10 +124,10 @@
123 -> Beholder BeholdTV 407 [0000:4070]
124 -> Beholder BeholdTV 407 FM [0000:4071]
125 -> Beholder BeholdTV 409 [0000:4090]
-126 -> Beholder BeholdTV 505 FM/RDS [0000:5051,0000:505B,5ace:5050]
-127 -> Beholder BeholdTV 507 FM/RDS / BeholdTV 509 FM [0000:5071,0000:507B,5ace:5070,5ace:5090]
+126 -> Beholder BeholdTV 505 FM [5ace:5050]
+127 -> Beholder BeholdTV 507 FM / BeholdTV 509 FM [5ace:5070,5ace:5090]
128 -> Beholder BeholdTV Columbus TVFM [0000:5201]
-129 -> Beholder BeholdTV 607 / BeholdTV 609 [5ace:6070,5ace:6071,5ace:6072,5ace:6073,5ace:6090,5ace:6091,5ace:6092,5ace:6093]
+129 -> Beholder BeholdTV 607 FM [5ace:6070]
130 -> Beholder BeholdTV M6 [5ace:6190]
131 -> Twinhan Hybrid DTV-DVB 3056 PCI [1822:0022]
132 -> Genius TVGO AM11MCE
@@ -143,7 +143,7 @@
142 -> Beholder BeholdTV H6 [5ace:6290]
143 -> Beholder BeholdTV M63 [5ace:6191]
144 -> Beholder BeholdTV M6 Extra [5ace:6193]
-145 -> AVerMedia MiniPCI DVB-T Hybrid M103 [1461:f636]
+145 -> AVerMedia MiniPCI DVB-T Hybrid M103 [1461:f636,1461:f736]
146 -> ASUSTeK P7131 Analog
147 -> Asus Tiger 3in1 [1043:4878]
148 -> Encore ENLTV-FM v5.3 [1a7f:2008]
@@ -153,5 +153,17 @@
152 -> Asus Tiger Rev:1.00 [1043:4857]
153 -> Kworld Plus TV Analog Lite PCI [17de:7128]
154 -> Avermedia AVerTV GO 007 FM Plus [1461:f31d]
-155 -> Hauppauge WinTV-HVR1120 ATSC/QAM-Hybrid [0070:6706,0070:6708]
-156 -> Hauppauge WinTV-HVR1110r3 [0070:6707,0070:6709,0070:670a]
+155 -> Hauppauge WinTV-HVR1150 ATSC/QAM-Hybrid [0070:6706,0070:6708]
+156 -> Hauppauge WinTV-HVR1120 DVB-T/Hybrid [0070:6707,0070:6709,0070:670a]
+157 -> Avermedia AVerTV Studio 507UA [1461:a11b]
+158 -> AVerMedia Cardbus TV/Radio (E501R) [1461:b7e9]
+159 -> Beholder BeholdTV 505 RDS [0000:505B]
+160 -> Beholder BeholdTV 507 RDS [0000:5071]
+161 -> Beholder BeholdTV 507 RDS [0000:507B]
+162 -> Beholder BeholdTV 607 FM [5ace:6071]
+163 -> Beholder BeholdTV 609 FM [5ace:6090]
+164 -> Beholder BeholdTV 609 FM [5ace:6091]
+165 -> Beholder BeholdTV 607 RDS [5ace:6072]
+166 -> Beholder BeholdTV 607 RDS [5ace:6073]
+167 -> Beholder BeholdTV 609 RDS [5ace:6092]
+168 -> Beholder BeholdTV 609 RDS [5ace:6093]
diff --git a/Documentation/video4linux/CARDLIST.tuner b/Documentation/video4linux/CARDLIST.tuner
index 691d2f37dc5..be67844074d 100644
--- a/Documentation/video4linux/CARDLIST.tuner
+++ b/Documentation/video4linux/CARDLIST.tuner
@@ -76,3 +76,5 @@ tuner=75 - Philips TEA5761 FM Radio
tuner=76 - Xceive 5000 tuner
tuner=77 - TCL tuner MF02GIP-5N-E
tuner=78 - Philips FMD1216MEX MK3 Hybrid Tuner
+tuner=79 - Philips PAL/SECAM multi (FM1216 MK5)
+tuner=80 - Philips FQ1216LME MK3 PAL/SECAM w/active loopthrough
diff --git a/Documentation/video4linux/cx18.txt b/Documentation/video4linux/cx18.txt
index 914cb7e734a..4652c0f5da3 100644
--- a/Documentation/video4linux/cx18.txt
+++ b/Documentation/video4linux/cx18.txt
@@ -11,7 +11,7 @@ encoder chip:
2) Some people have problems getting the i2c bus to work.
The symptom is that the eeprom cannot be read and the card is
unusable. This is probably fixed, but if you have problems
- then post to the video4linux or ivtv-users mailinglist.
+ then post to the video4linux or ivtv-users mailing list.
3) VBI (raw or sliced) has not yet been implemented.
diff --git a/Documentation/video4linux/gspca.txt b/Documentation/video4linux/gspca.txt
index 98529e03a46..573f95b5880 100644
--- a/Documentation/video4linux/gspca.txt
+++ b/Documentation/video4linux/gspca.txt
@@ -44,7 +44,9 @@ zc3xx 0458:7007 Genius VideoCam V2
zc3xx 0458:700c Genius VideoCam V3
zc3xx 0458:700f Genius VideoCam Web V2
sonixj 0458:7025 Genius Eye 311Q
+sn9c20x 0458:7029 Genius Look 320s
sonixj 0458:702e Genius Slim 310 NB
+sn9c20x 045e:00f4 LifeCam VX-6000 (SN9C20x + OV9650)
sonixj 045e:00f5 MicroSoft VX3000
sonixj 045e:00f7 MicroSoft VX1000
ov519 045e:028c Micro$oft xbox cam
@@ -163,10 +165,11 @@ sunplus 055f:c650 Mustek MDC5500Z
zc3xx 055f:d003 Mustek WCam300A
zc3xx 055f:d004 Mustek WCam300 AN
conex 0572:0041 Creative Notebook cx11646
-ov519 05a9:0519 OmniVision
+ov519 05a9:0519 OV519 Microphone
ov519 05a9:0530 OmniVision
-ov519 05a9:4519 OmniVision
+ov519 05a9:4519 Webcam Classic
ov519 05a9:8519 OmniVision
+ov519 05a9:a518 D-Link DSB-C310 Webcam
sunplus 05da:1018 Digital Dream Enigma 1.3
stk014 05e1:0893 Syntek DV4000
spca561 060b:a001 Maxell Compact Pc PM3
@@ -178,6 +181,7 @@ spca506 06e1:a190 ADS Instant VCD
ov534 06f8:3002 Hercules Blog Webcam
ov534 06f8:3003 Hercules Dualpix HD Weblog
sonixj 06f8:3004 Hercules Classic Silver
+sonixj 06f8:3008 Hercules Deluxe Optical Glass
spca508 0733:0110 ViewQuest VQ110
spca508 0130:0130 Clone Digital Webcam 11043
spca501 0733:0401 Intel Create and Share
@@ -209,6 +213,7 @@ sunplus 08ca:2050 Medion MD 41437
sunplus 08ca:2060 Aiptek PocketDV5300
tv8532 0923:010f ICM532 cams
mars 093a:050f Mars-Semi Pc-Camera
+mr97310a 093a:010f Sakar Digital no. 77379
pac207 093a:2460 Qtec Webcam 100
pac207 093a:2461 HP Webcam
pac207 093a:2463 Philips SPC 220 NC
@@ -265,6 +270,11 @@ sonixj 0c45:60ec SN9C105+MO4000
sonixj 0c45:60fb Surfer NoName
sonixj 0c45:60fc LG-LIC300
sonixj 0c45:60fe Microdia Audio
+sonixj 0c45:6100 PC Camera (SN9C128)
+sonixj 0c45:610a PC Camera (SN9C128)
+sonixj 0c45:610b PC Camera (SN9C128)
+sonixj 0c45:610c PC Camera (SN9C128)
+sonixj 0c45:610e PC Camera (SN9C128)
sonixj 0c45:6128 Microdia/Sonix SNP325
sonixj 0c45:612a Avant Camera
sonixj 0c45:612c Typhoon Rasy Cam 1.3MPix
@@ -274,6 +284,28 @@ sonixj 0c45:613a Microdia Sonix PC Camera
sonixj 0c45:613b Surfer SN-206
sonixj 0c45:613c Sonix Pccam168
sonixj 0c45:6143 Sonix Pccam168
+sn9c20x 0c45:6240 PC Camera (SN9C201 + MT9M001)
+sn9c20x 0c45:6242 PC Camera (SN9C201 + MT9M111)
+sn9c20x 0c45:6248 PC Camera (SN9C201 + OV9655)
+sn9c20x 0c45:624e PC Camera (SN9C201 + SOI968)
+sn9c20x 0c45:624f PC Camera (SN9C201 + OV9650)
+sn9c20x 0c45:6251 PC Camera (SN9C201 + OV9650)
+sn9c20x 0c45:6253 PC Camera (SN9C201 + OV9650)
+sn9c20x 0c45:6260 PC Camera (SN9C201 + OV7670)
+sn9c20x 0c45:6270 PC Camera (SN9C201 + MT9V011/MT9V111/MT9V112)
+sn9c20x 0c45:627b PC Camera (SN9C201 + OV7660)
+sn9c20x 0c45:627c PC Camera (SN9C201 + HV7131R)
+sn9c20x 0c45:627f PC Camera (SN9C201 + OV9650)
+sn9c20x 0c45:6280 PC Camera (SN9C202 + MT9M001)
+sn9c20x 0c45:6282 PC Camera (SN9C202 + MT9M111)
+sn9c20x 0c45:6288 PC Camera (SN9C202 + OV9655)
+sn9c20x 0c45:628e PC Camera (SN9C202 + SOI968)
+sn9c20x 0c45:628f PC Camera (SN9C202 + OV9650)
+sn9c20x 0c45:62a0 PC Camera (SN9C202 + OV7670)
+sn9c20x 0c45:62b0 PC Camera (SN9C202 + MT9V011/MT9V111/MT9V112)
+sn9c20x 0c45:62b3 PC Camera (SN9C202 + OV9655)
+sn9c20x 0c45:62bb PC Camera (SN9C202 + OV7660)
+sn9c20x 0c45:62bc PC Camera (SN9C202 + HV7131R)
sunplus 0d64:0303 Sunplus FashionCam DXG
etoms 102c:6151 Qcam Sangha CIF
etoms 102c:6251 Qcam xxxxxx VGA
@@ -282,6 +314,7 @@ spca561 10fd:7e50 FlyCam Usb 100
zc3xx 10fd:8050 Typhoon Webshot II USB 300k
ov534 1415:2000 Sony HD Eye for PS3 (SLEH 00201)
pac207 145f:013a Trust WB-1300N
+sn9c20x 145f:013d Trust WB-3600R
vc032x 15b8:6001 HP 2.0 Megapixel
vc032x 15b8:6002 HP 2.0 Megapixel rz406aa
spca501 1776:501c Arowana 300K CMOS Camera
@@ -292,4 +325,11 @@ spca500 2899:012c Toptro Industrial
spca508 8086:0110 Intel Easy PC Camera
spca500 8086:0630 Intel Pocket PC Camera
spca506 99fa:8988 Grandtec V.cap
+sn9c20x a168:0610 Dino-Lite Digital Microscope (SN9C201 + HV7131R)
+sn9c20x a168:0611 Dino-Lite Digital Microscope (SN9C201 + HV7131R)
+sn9c20x a168:0613 Dino-Lite Digital Microscope (SN9C201 + HV7131R)
+sn9c20x a168:0618 Dino-Lite Digital Microscope (SN9C201 + HV7131R)
+sn9c20x a168:0614 Dino-Lite Digital Microscope (SN9C201 + MT9M111)
+sn9c20x a168:0615 Dino-Lite Digital Microscope (SN9C201 + MT9M111)
+sn9c20x a168:0617 Dino-Lite Digital Microscope (SN9C201 + MT9M111)
spca561 abcd:cdee Petcam
diff --git a/Documentation/video4linux/pxa_camera.txt b/Documentation/video4linux/pxa_camera.txt
index b1137f9a53e..4f6d0ca0195 100644
--- a/Documentation/video4linux/pxa_camera.txt
+++ b/Documentation/video4linux/pxa_camera.txt
@@ -26,6 +26,55 @@ Global video workflow
Once the last buffer is filled in, the QCI interface stops.
+ c) Capture global finite state machine schema
+
+ +----+ +---+ +----+
+ | DQ | | Q | | DQ |
+ | v | v | v
+ +-----------+ +------------------------+
+ | STOP | | Wait for capture start |
+ +-----------+ Q +------------------------+
++-> | QCI: stop | ------------------> | QCI: run | <------------+
+| | DMA: stop | | DMA: stop | |
+| +-----------+ +-----> +------------------------+ |
+| / | |
+| / +---+ +----+ | |
+|capture list empty / | Q | | DQ | | QCI Irq EOF |
+| / | v | v v |
+| +--------------------+ +----------------------+ |
+| | DMA hotlink missed | | Capture running | |
+| +--------------------+ +----------------------+ |
+| | QCI: run | +-----> | QCI: run | <-+ |
+| | DMA: stop | / | DMA: run | | |
+| +--------------------+ / +----------------------+ | Other |
+| ^ /DMA still | | channels |
+| | capture list / running | DMA Irq End | not |
+| | not empty / | | finished |
+| | / v | yet |
+| +----------------------+ +----------------------+ | |
+| | Videobuf released | | Channel completed | | |
+| +----------------------+ +----------------------+ | |
++-- | QCI: run | | QCI: run | --+ |
+ | DMA: run | | DMA: run | |
+ +----------------------+ +----------------------+ |
+ ^ / | |
+ | no overrun / | overrun |
+ | / v |
+ +--------------------+ / +----------------------+ |
+ | Frame completed | / | Frame overran | |
+ +--------------------+ <-----+ +----------------------+ restart frame |
+ | QCI: run | | QCI: stop | --------------+
+ | DMA: run | | DMA: stop |
+ +--------------------+ +----------------------+
+
+ Legend: - each box is a FSM state
+ - each arrow is the condition to transition to another state
+ - an arrow with a comment is a mandatory transition (no condition)
+ - arrow "Q" means : a buffer was enqueued
+ - arrow "DQ" means : a buffer was dequeued
+ - "QCI: stop" means the QCI interface is not enabled
+ - "DMA: stop" means all 3 DMA channels are stopped
+ - "DMA: run" means at least 1 DMA channel is still running
DMA usage
---------
diff --git a/Documentation/video4linux/v4l2-framework.txt b/Documentation/video4linux/v4l2-framework.txt
index 854808b67fa..ba4706afc5f 100644
--- a/Documentation/video4linux/v4l2-framework.txt
+++ b/Documentation/video4linux/v4l2-framework.txt
@@ -89,6 +89,11 @@ from dev (driver name followed by the bus_id, to be precise). If you set it
up before calling v4l2_device_register then it will be untouched. If dev is
NULL, then you *must* setup v4l2_dev->name before calling v4l2_device_register.
+You can use v4l2_device_set_name() to set the name based on a driver name and
+a driver-global atomic_t instance. This will generate names like ivtv0, ivtv1,
+etc. If the name ends with a digit, then it will insert a dash: cx18-0,
+cx18-1, etc. This function returns the instance number.
+
The first 'dev' argument is normally the struct device pointer of a pci_dev,
usb_interface or platform_device. It is rare for dev to be NULL, but it happens
with ISA devices or when one device creates multiple PCI devices, thus making
@@ -385,6 +390,30 @@ later date. It differs between i2c drivers and as such can be confusing.
To see which chip variants are supported you can look in the i2c driver code
for the i2c_device_id table. This lists all the possibilities.
+There are two more helper functions:
+
+v4l2_i2c_new_subdev_cfg: this function adds new irq and platform_data
+arguments and has both 'addr' and 'probed_addrs' arguments: if addr is not
+0 then that will be used (non-probing variant), otherwise the probed_addrs
+are probed.
+
+For example: this will probe for address 0x10:
+
+struct v4l2_subdev *sd = v4l2_i2c_new_subdev_cfg(v4l2_dev, adapter,
+ "module_foo", "chipid", 0, NULL, 0, I2C_ADDRS(0x10));
+
+v4l2_i2c_new_subdev_board uses an i2c_board_info struct which is passed
+to the i2c driver and replaces the irq, platform_data and addr arguments.
+
+If the subdev supports the s_config core ops, then that op is called with
+the irq and platform_data arguments after the subdev was setup. The older
+v4l2_i2c_new_(probed_)subdev functions will call s_config as well, but with
+irq set to 0 and platform_data set to NULL.
+
+Note that in the next kernel release the functions v4l2_i2c_new_subdev,
+v4l2_i2c_new_probed_subdev and v4l2_i2c_new_probed_subdev_addr will all be
+replaced by a single v4l2_i2c_new_subdev that is identical to
+v4l2_i2c_new_subdev_cfg but without the irq and platform_data arguments.
struct video_device
-------------------
diff --git a/Documentation/vm/Makefile b/Documentation/vm/Makefile
index 6f562f778b2..5bd269b3731 100644
--- a/Documentation/vm/Makefile
+++ b/Documentation/vm/Makefile
@@ -2,7 +2,7 @@
obj- := dummy.o
# List of programs to build
-hostprogs-y := slabinfo
+hostprogs-y := slabinfo page-types
# Tell kbuild to always build the programs
always := $(hostprogs-y)
diff --git a/Documentation/vm/balance b/Documentation/vm/balance
index bd3d31bc491..c46e68cf934 100644
--- a/Documentation/vm/balance
+++ b/Documentation/vm/balance
@@ -75,15 +75,15 @@ Page stealing from process memory and shm is done if stealing the page would
alleviate memory pressure on any zone in the page's node that has fallen below
its watermark.
-pages_min/pages_low/pages_high/low_on_memory/zone_wake_kswapd: These are
-per-zone fields, used to determine when a zone needs to be balanced. When
-the number of pages falls below pages_min, the hysteric field low_on_memory
-gets set. This stays set till the number of free pages becomes pages_high.
-When low_on_memory is set, page allocation requests will try to free some
-pages in the zone (providing GFP_WAIT is set in the request). Orthogonal
-to this, is the decision to poke kswapd to free some zone pages. That
-decision is not hysteresis based, and is done when the number of free
-pages is below pages_low; in which case zone_wake_kswapd is also set.
+watemark[WMARK_MIN/WMARK_LOW/WMARK_HIGH]/low_on_memory/zone_wake_kswapd: These
+are per-zone fields, used to determine when a zone needs to be balanced. When
+the number of pages falls below watermark[WMARK_MIN], the hysteric field
+low_on_memory gets set. This stays set till the number of free pages becomes
+watermark[WMARK_HIGH]. When low_on_memory is set, page allocation requests will
+try to free some pages in the zone (providing GFP_WAIT is set in the request).
+Orthogonal to this, is the decision to poke kswapd to free some zone pages.
+That decision is not hysteresis based, and is done when the number of free
+pages is below watermark[WMARK_LOW]; in which case zone_wake_kswapd is also set.
(Good) Ideas that I have heard:
diff --git a/Documentation/vm/page-types.c b/Documentation/vm/page-types.c
new file mode 100644
index 00000000000..0833f44ba16
--- /dev/null
+++ b/Documentation/vm/page-types.c
@@ -0,0 +1,698 @@
+/*
+ * page-types: Tool for querying page flags
+ *
+ * Copyright (C) 2009 Intel corporation
+ * Copyright (C) 2009 Wu Fengguang <fengguang.wu@intel.com>
+ */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include <stdint.h>
+#include <stdarg.h>
+#include <string.h>
+#include <getopt.h>
+#include <limits.h>
+#include <sys/types.h>
+#include <sys/errno.h>
+#include <sys/fcntl.h>
+
+
+/*
+ * kernel page flags
+ */
+
+#define KPF_BYTES 8
+#define PROC_KPAGEFLAGS "/proc/kpageflags"
+
+/* copied from kpageflags_read() */
+#define KPF_LOCKED 0
+#define KPF_ERROR 1
+#define KPF_REFERENCED 2
+#define KPF_UPTODATE 3
+#define KPF_DIRTY 4
+#define KPF_LRU 5
+#define KPF_ACTIVE 6
+#define KPF_SLAB 7
+#define KPF_WRITEBACK 8
+#define KPF_RECLAIM 9
+#define KPF_BUDDY 10
+
+/* [11-20] new additions in 2.6.31 */
+#define KPF_MMAP 11
+#define KPF_ANON 12
+#define KPF_SWAPCACHE 13
+#define KPF_SWAPBACKED 14
+#define KPF_COMPOUND_HEAD 15
+#define KPF_COMPOUND_TAIL 16
+#define KPF_HUGE 17
+#define KPF_UNEVICTABLE 18
+#define KPF_NOPAGE 20
+
+/* [32-] kernel hacking assistances */
+#define KPF_RESERVED 32
+#define KPF_MLOCKED 33
+#define KPF_MAPPEDTODISK 34
+#define KPF_PRIVATE 35
+#define KPF_PRIVATE_2 36
+#define KPF_OWNER_PRIVATE 37
+#define KPF_ARCH 38
+#define KPF_UNCACHED 39
+
+/* [48-] take some arbitrary free slots for expanding overloaded flags
+ * not part of kernel API
+ */
+#define KPF_READAHEAD 48
+#define KPF_SLOB_FREE 49
+#define KPF_SLUB_FROZEN 50
+#define KPF_SLUB_DEBUG 51
+
+#define KPF_ALL_BITS ((uint64_t)~0ULL)
+#define KPF_HACKERS_BITS (0xffffULL << 32)
+#define KPF_OVERLOADED_BITS (0xffffULL << 48)
+#define BIT(name) (1ULL << KPF_##name)
+#define BITS_COMPOUND (BIT(COMPOUND_HEAD) | BIT(COMPOUND_TAIL))
+
+static char *page_flag_names[] = {
+ [KPF_LOCKED] = "L:locked",
+ [KPF_ERROR] = "E:error",
+ [KPF_REFERENCED] = "R:referenced",
+ [KPF_UPTODATE] = "U:uptodate",
+ [KPF_DIRTY] = "D:dirty",
+ [KPF_LRU] = "l:lru",
+ [KPF_ACTIVE] = "A:active",
+ [KPF_SLAB] = "S:slab",
+ [KPF_WRITEBACK] = "W:writeback",
+ [KPF_RECLAIM] = "I:reclaim",
+ [KPF_BUDDY] = "B:buddy",
+
+ [KPF_MMAP] = "M:mmap",
+ [KPF_ANON] = "a:anonymous",
+ [KPF_SWAPCACHE] = "s:swapcache",
+ [KPF_SWAPBACKED] = "b:swapbacked",
+ [KPF_COMPOUND_HEAD] = "H:compound_head",
+ [KPF_COMPOUND_TAIL] = "T:compound_tail",
+ [KPF_HUGE] = "G:huge",
+ [KPF_UNEVICTABLE] = "u:unevictable",
+ [KPF_NOPAGE] = "n:nopage",
+
+ [KPF_RESERVED] = "r:reserved",
+ [KPF_MLOCKED] = "m:mlocked",
+ [KPF_MAPPEDTODISK] = "d:mappedtodisk",
+ [KPF_PRIVATE] = "P:private",
+ [KPF_PRIVATE_2] = "p:private_2",
+ [KPF_OWNER_PRIVATE] = "O:owner_private",
+ [KPF_ARCH] = "h:arch",
+ [KPF_UNCACHED] = "c:uncached",
+
+ [KPF_READAHEAD] = "I:readahead",
+ [KPF_SLOB_FREE] = "P:slob_free",
+ [KPF_SLUB_FROZEN] = "A:slub_frozen",
+ [KPF_SLUB_DEBUG] = "E:slub_debug",
+};
+
+
+/*
+ * data structures
+ */
+
+static int opt_raw; /* for kernel developers */
+static int opt_list; /* list pages (in ranges) */
+static int opt_no_summary; /* don't show summary */
+static pid_t opt_pid; /* process to walk */
+
+#define MAX_ADDR_RANGES 1024
+static int nr_addr_ranges;
+static unsigned long opt_offset[MAX_ADDR_RANGES];
+static unsigned long opt_size[MAX_ADDR_RANGES];
+
+#define MAX_BIT_FILTERS 64
+static int nr_bit_filters;
+static uint64_t opt_mask[MAX_BIT_FILTERS];
+static uint64_t opt_bits[MAX_BIT_FILTERS];
+
+static int page_size;
+
+#define PAGES_BATCH (64 << 10) /* 64k pages */
+static int kpageflags_fd;
+static uint64_t kpageflags_buf[KPF_BYTES * PAGES_BATCH];
+
+#define HASH_SHIFT 13
+#define HASH_SIZE (1 << HASH_SHIFT)
+#define HASH_MASK (HASH_SIZE - 1)
+#define HASH_KEY(flags) (flags & HASH_MASK)
+
+static unsigned long total_pages;
+static unsigned long nr_pages[HASH_SIZE];
+static uint64_t page_flags[HASH_SIZE];
+
+
+/*
+ * helper functions
+ */
+
+#define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0]))
+
+#define min_t(type, x, y) ({ \
+ type __min1 = (x); \
+ type __min2 = (y); \
+ __min1 < __min2 ? __min1 : __min2; })
+
+unsigned long pages2mb(unsigned long pages)
+{
+ return (pages * page_size) >> 20;
+}
+
+void fatal(const char *x, ...)
+{
+ va_list ap;
+
+ va_start(ap, x);
+ vfprintf(stderr, x, ap);
+ va_end(ap);
+ exit(EXIT_FAILURE);
+}
+
+
+/*
+ * page flag names
+ */
+
+char *page_flag_name(uint64_t flags)
+{
+ static char buf[65];
+ int present;
+ int i, j;
+
+ for (i = 0, j = 0; i < ARRAY_SIZE(page_flag_names); i++) {
+ present = (flags >> i) & 1;
+ if (!page_flag_names[i]) {
+ if (present)
+ fatal("unkown flag bit %d\n", i);
+ continue;
+ }
+ buf[j++] = present ? page_flag_names[i][0] : '_';
+ }
+
+ return buf;
+}
+
+char *page_flag_longname(uint64_t flags)
+{
+ static char buf[1024];
+ int i, n;
+
+ for (i = 0, n = 0; i < ARRAY_SIZE(page_flag_names); i++) {
+ if (!page_flag_names[i])
+ continue;
+ if ((flags >> i) & 1)
+ n += snprintf(buf + n, sizeof(buf) - n, "%s,",
+ page_flag_names[i] + 2);
+ }
+ if (n)
+ n--;
+ buf[n] = '\0';
+
+ return buf;
+}
+
+
+/*
+ * page list and summary
+ */
+
+void show_page_range(unsigned long offset, uint64_t flags)
+{
+ static uint64_t flags0;
+ static unsigned long index;
+ static unsigned long count;
+
+ if (flags == flags0 && offset == index + count) {
+ count++;
+ return;
+ }
+
+ if (count)
+ printf("%lu\t%lu\t%s\n",
+ index, count, page_flag_name(flags0));
+
+ flags0 = flags;
+ index = offset;
+ count = 1;
+}
+
+void show_page(unsigned long offset, uint64_t flags)
+{
+ printf("%lu\t%s\n", offset, page_flag_name(flags));
+}
+
+void show_summary(void)
+{
+ int i;
+
+ printf(" flags\tpage-count MB"
+ " symbolic-flags\t\t\tlong-symbolic-flags\n");
+
+ for (i = 0; i < ARRAY_SIZE(nr_pages); i++) {
+ if (nr_pages[i])
+ printf("0x%016llx\t%10lu %8lu %s\t%s\n",
+ (unsigned long long)page_flags[i],
+ nr_pages[i],
+ pages2mb(nr_pages[i]),
+ page_flag_name(page_flags[i]),
+ page_flag_longname(page_flags[i]));
+ }
+
+ printf(" total\t%10lu %8lu\n",
+ total_pages, pages2mb(total_pages));
+}
+
+
+/*
+ * page flag filters
+ */
+
+int bit_mask_ok(uint64_t flags)
+{
+ int i;
+
+ for (i = 0; i < nr_bit_filters; i++) {
+ if (opt_bits[i] == KPF_ALL_BITS) {
+ if ((flags & opt_mask[i]) == 0)
+ return 0;
+ } else {
+ if ((flags & opt_mask[i]) != opt_bits[i])
+ return 0;
+ }
+ }
+
+ return 1;
+}
+
+uint64_t expand_overloaded_flags(uint64_t flags)
+{
+ /* SLOB/SLUB overload several page flags */
+ if (flags & BIT(SLAB)) {
+ if (flags & BIT(PRIVATE))
+ flags ^= BIT(PRIVATE) | BIT(SLOB_FREE);
+ if (flags & BIT(ACTIVE))
+ flags ^= BIT(ACTIVE) | BIT(SLUB_FROZEN);
+ if (flags & BIT(ERROR))
+ flags ^= BIT(ERROR) | BIT(SLUB_DEBUG);
+ }
+
+ /* PG_reclaim is overloaded as PG_readahead in the read path */
+ if ((flags & (BIT(RECLAIM) | BIT(WRITEBACK))) == BIT(RECLAIM))
+ flags ^= BIT(RECLAIM) | BIT(READAHEAD);
+
+ return flags;
+}
+
+uint64_t well_known_flags(uint64_t flags)
+{
+ /* hide flags intended only for kernel hacker */
+ flags &= ~KPF_HACKERS_BITS;
+
+ /* hide non-hugeTLB compound pages */
+ if ((flags & BITS_COMPOUND) && !(flags & BIT(HUGE)))
+ flags &= ~BITS_COMPOUND;
+
+ return flags;
+}
+
+
+/*
+ * page frame walker
+ */
+
+int hash_slot(uint64_t flags)
+{
+ int k = HASH_KEY(flags);
+ int i;
+
+ /* Explicitly reserve slot 0 for flags 0: the following logic
+ * cannot distinguish an unoccupied slot from slot (flags==0).
+ */
+ if (flags == 0)
+ return 0;
+
+ /* search through the remaining (HASH_SIZE-1) slots */
+ for (i = 1; i < ARRAY_SIZE(page_flags); i++, k++) {
+ if (!k || k >= ARRAY_SIZE(page_flags))
+ k = 1;
+ if (page_flags[k] == 0) {
+ page_flags[k] = flags;
+ return k;
+ }
+ if (page_flags[k] == flags)
+ return k;
+ }
+
+ fatal("hash table full: bump up HASH_SHIFT?\n");
+ exit(EXIT_FAILURE);
+}
+
+void add_page(unsigned long offset, uint64_t flags)
+{
+ flags = expand_overloaded_flags(flags);
+
+ if (!opt_raw)
+ flags = well_known_flags(flags);
+
+ if (!bit_mask_ok(flags))
+ return;
+
+ if (opt_list == 1)
+ show_page_range(offset, flags);
+ else if (opt_list == 2)
+ show_page(offset, flags);
+
+ nr_pages[hash_slot(flags)]++;
+ total_pages++;
+}
+
+void walk_pfn(unsigned long index, unsigned long count)
+{
+ unsigned long batch;
+ unsigned long n;
+ unsigned long i;
+
+ if (index > ULONG_MAX / KPF_BYTES)
+ fatal("index overflow: %lu\n", index);
+
+ lseek(kpageflags_fd, index * KPF_BYTES, SEEK_SET);
+
+ while (count) {
+ batch = min_t(unsigned long, count, PAGES_BATCH);
+ n = read(kpageflags_fd, kpageflags_buf, batch * KPF_BYTES);
+ if (n == 0)
+ break;
+ if (n < 0) {
+ perror(PROC_KPAGEFLAGS);
+ exit(EXIT_FAILURE);
+ }
+
+ if (n % KPF_BYTES != 0)
+ fatal("partial read: %lu bytes\n", n);
+ n = n / KPF_BYTES;
+
+ for (i = 0; i < n; i++)
+ add_page(index + i, kpageflags_buf[i]);
+
+ index += batch;
+ count -= batch;
+ }
+}
+
+void walk_addr_ranges(void)
+{
+ int i;
+
+ kpageflags_fd = open(PROC_KPAGEFLAGS, O_RDONLY);
+ if (kpageflags_fd < 0) {
+ perror(PROC_KPAGEFLAGS);
+ exit(EXIT_FAILURE);
+ }
+
+ if (!nr_addr_ranges)
+ walk_pfn(0, ULONG_MAX);
+
+ for (i = 0; i < nr_addr_ranges; i++)
+ walk_pfn(opt_offset[i], opt_size[i]);
+
+ close(kpageflags_fd);
+}
+
+
+/*
+ * user interface
+ */
+
+const char *page_flag_type(uint64_t flag)
+{
+ if (flag & KPF_HACKERS_BITS)
+ return "(r)";
+ if (flag & KPF_OVERLOADED_BITS)
+ return "(o)";
+ return " ";
+}
+
+void usage(void)
+{
+ int i, j;
+
+ printf(
+"page-types [options]\n"
+" -r|--raw Raw mode, for kernel developers\n"
+" -a|--addr addr-spec Walk a range of pages\n"
+" -b|--bits bits-spec Walk pages with specified bits\n"
+#if 0 /* planned features */
+" -p|--pid pid Walk process address space\n"
+" -f|--file filename Walk file address space\n"
+#endif
+" -l|--list Show page details in ranges\n"
+" -L|--list-each Show page details one by one\n"
+" -N|--no-summary Don't show summay info\n"
+" -h|--help Show this usage message\n"
+"addr-spec:\n"
+" N one page at offset N (unit: pages)\n"
+" N+M pages range from N to N+M-1\n"
+" N,M pages range from N to M-1\n"
+" N, pages range from N to end\n"
+" ,M pages range from 0 to M\n"
+"bits-spec:\n"
+" bit1,bit2 (flags & (bit1|bit2)) != 0\n"
+" bit1,bit2=bit1 (flags & (bit1|bit2)) == bit1\n"
+" bit1,~bit2 (flags & (bit1|bit2)) == bit1\n"
+" =bit1,bit2 flags == (bit1|bit2)\n"
+"bit-names:\n"
+ );
+
+ for (i = 0, j = 0; i < ARRAY_SIZE(page_flag_names); i++) {
+ if (!page_flag_names[i])
+ continue;
+ printf("%16s%s", page_flag_names[i] + 2,
+ page_flag_type(1ULL << i));
+ if (++j > 3) {
+ j = 0;
+ putchar('\n');
+ }
+ }
+ printf("\n "
+ "(r) raw mode bits (o) overloaded bits\n");
+}
+
+unsigned long long parse_number(const char *str)
+{
+ unsigned long long n;
+
+ n = strtoll(str, NULL, 0);
+
+ if (n == 0 && str[0] != '0')
+ fatal("invalid name or number: %s\n", str);
+
+ return n;
+}
+
+void parse_pid(const char *str)
+{
+ opt_pid = parse_number(str);
+}
+
+void parse_file(const char *name)
+{
+}
+
+void add_addr_range(unsigned long offset, unsigned long size)
+{
+ if (nr_addr_ranges >= MAX_ADDR_RANGES)
+ fatal("too much addr ranges\n");
+
+ opt_offset[nr_addr_ranges] = offset;
+ opt_size[nr_addr_ranges] = size;
+ nr_addr_ranges++;
+}
+
+void parse_addr_range(const char *optarg)
+{
+ unsigned long offset;
+ unsigned long size;
+ char *p;
+
+ p = strchr(optarg, ',');
+ if (!p)
+ p = strchr(optarg, '+');
+
+ if (p == optarg) {
+ offset = 0;
+ size = parse_number(p + 1);
+ } else if (p) {
+ offset = parse_number(optarg);
+ if (p[1] == '\0')
+ size = ULONG_MAX;
+ else {
+ size = parse_number(p + 1);
+ if (*p == ',') {
+ if (size < offset)
+ fatal("invalid range: %lu,%lu\n",
+ offset, size);
+ size -= offset;
+ }
+ }
+ } else {
+ offset = parse_number(optarg);
+ size = 1;
+ }
+
+ add_addr_range(offset, size);
+}
+
+void add_bits_filter(uint64_t mask, uint64_t bits)
+{
+ if (nr_bit_filters >= MAX_BIT_FILTERS)
+ fatal("too much bit filters\n");
+
+ opt_mask[nr_bit_filters] = mask;
+ opt_bits[nr_bit_filters] = bits;
+ nr_bit_filters++;
+}
+
+uint64_t parse_flag_name(const char *str, int len)
+{
+ int i;
+
+ if (!*str || !len)
+ return 0;
+
+ if (len <= 8 && !strncmp(str, "compound", len))
+ return BITS_COMPOUND;
+
+ for (i = 0; i < ARRAY_SIZE(page_flag_names); i++) {
+ if (!page_flag_names[i])
+ continue;
+ if (!strncmp(str, page_flag_names[i] + 2, len))
+ return 1ULL << i;
+ }
+
+ return parse_number(str);
+}
+
+uint64_t parse_flag_names(const char *str, int all)
+{
+ const char *p = str;
+ uint64_t flags = 0;
+
+ while (1) {
+ if (*p == ',' || *p == '=' || *p == '\0') {
+ if ((*str != '~') || (*str == '~' && all && *++str))
+ flags |= parse_flag_name(str, p - str);
+ if (*p != ',')
+ break;
+ str = p + 1;
+ }
+ p++;
+ }
+
+ return flags;
+}
+
+void parse_bits_mask(const char *optarg)
+{
+ uint64_t mask;
+ uint64_t bits;
+ const char *p;
+
+ p = strchr(optarg, '=');
+ if (p == optarg) {
+ mask = KPF_ALL_BITS;
+ bits = parse_flag_names(p + 1, 0);
+ } else if (p) {
+ mask = parse_flag_names(optarg, 0);
+ bits = parse_flag_names(p + 1, 0);
+ } else if (strchr(optarg, '~')) {
+ mask = parse_flag_names(optarg, 1);
+ bits = parse_flag_names(optarg, 0);
+ } else {
+ mask = parse_flag_names(optarg, 0);
+ bits = KPF_ALL_BITS;
+ }
+
+ add_bits_filter(mask, bits);
+}
+
+
+struct option opts[] = {
+ { "raw" , 0, NULL, 'r' },
+ { "pid" , 1, NULL, 'p' },
+ { "file" , 1, NULL, 'f' },
+ { "addr" , 1, NULL, 'a' },
+ { "bits" , 1, NULL, 'b' },
+ { "list" , 0, NULL, 'l' },
+ { "list-each" , 0, NULL, 'L' },
+ { "no-summary", 0, NULL, 'N' },
+ { "help" , 0, NULL, 'h' },
+ { NULL , 0, NULL, 0 }
+};
+
+int main(int argc, char *argv[])
+{
+ int c;
+
+ page_size = getpagesize();
+
+ while ((c = getopt_long(argc, argv,
+ "rp:f:a:b:lLNh", opts, NULL)) != -1) {
+ switch (c) {
+ case 'r':
+ opt_raw = 1;
+ break;
+ case 'p':
+ parse_pid(optarg);
+ break;
+ case 'f':
+ parse_file(optarg);
+ break;
+ case 'a':
+ parse_addr_range(optarg);
+ break;
+ case 'b':
+ parse_bits_mask(optarg);
+ break;
+ case 'l':
+ opt_list = 1;
+ break;
+ case 'L':
+ opt_list = 2;
+ break;
+ case 'N':
+ opt_no_summary = 1;
+ break;
+ case 'h':
+ usage();
+ exit(0);
+ default:
+ usage();
+ exit(1);
+ }
+ }
+
+ if (opt_list == 1)
+ printf("offset\tcount\tflags\n");
+ if (opt_list == 2)
+ printf("offset\tflags\n");
+
+ walk_addr_ranges();
+
+ if (opt_list == 1)
+ show_page_range(0, 0); /* drain the buffer */
+
+ if (opt_no_summary)
+ return 0;
+
+ if (opt_list)
+ printf("\n\n");
+
+ show_summary();
+
+ return 0;
+}
diff --git a/Documentation/vm/pagemap.txt b/Documentation/vm/pagemap.txt
index ce72c0fe617..600a304a828 100644
--- a/Documentation/vm/pagemap.txt
+++ b/Documentation/vm/pagemap.txt
@@ -12,9 +12,9 @@ There are three components to pagemap:
value for each virtual page, containing the following data (from
fs/proc/task_mmu.c, above pagemap_read):
- * Bits 0-55 page frame number (PFN) if present
+ * Bits 0-54 page frame number (PFN) if present
* Bits 0-4 swap type if swapped
- * Bits 5-55 swap offset if swapped
+ * Bits 5-54 swap offset if swapped
* Bits 55-60 page shift (page size = 1<<page shift)
* Bit 61 reserved for future use
* Bit 62 page swapped
@@ -36,7 +36,7 @@ There are three components to pagemap:
* /proc/kpageflags. This file contains a 64-bit set of flags for each
page, indexed by PFN.
- The flags are (from fs/proc/proc_misc, above kpageflags_read):
+ The flags are (from fs/proc/page.c, above kpageflags_read):
0. LOCKED
1. ERROR
@@ -49,6 +49,68 @@ There are three components to pagemap:
8. WRITEBACK
9. RECLAIM
10. BUDDY
+ 11. MMAP
+ 12. ANON
+ 13. SWAPCACHE
+ 14. SWAPBACKED
+ 15. COMPOUND_HEAD
+ 16. COMPOUND_TAIL
+ 16. HUGE
+ 18. UNEVICTABLE
+ 20. NOPAGE
+
+Short descriptions to the page flags:
+
+ 0. LOCKED
+ page is being locked for exclusive access, eg. by undergoing read/write IO
+
+ 7. SLAB
+ page is managed by the SLAB/SLOB/SLUB/SLQB kernel memory allocator
+ When compound page is used, SLUB/SLQB will only set this flag on the head
+ page; SLOB will not flag it at all.
+
+10. BUDDY
+ a free memory block managed by the buddy system allocator
+ The buddy system organizes free memory in blocks of various orders.
+ An order N block has 2^N physically contiguous pages, with the BUDDY flag
+ set for and _only_ for the first page.
+
+15. COMPOUND_HEAD
+16. COMPOUND_TAIL
+ A compound page with order N consists of 2^N physically contiguous pages.
+ A compound page with order 2 takes the form of "HTTT", where H donates its
+ head page and T donates its tail page(s). The major consumers of compound
+ pages are hugeTLB pages (Documentation/vm/hugetlbpage.txt), the SLUB etc.
+ memory allocators and various device drivers. However in this interface,
+ only huge/giga pages are made visible to end users.
+17. HUGE
+ this is an integral part of a HugeTLB page
+
+20. NOPAGE
+ no page frame exists at the requested address
+
+ [IO related page flags]
+ 1. ERROR IO error occurred
+ 3. UPTODATE page has up-to-date data
+ ie. for file backed page: (in-memory data revision >= on-disk one)
+ 4. DIRTY page has been written to, hence contains new data
+ ie. for file backed page: (in-memory data revision > on-disk one)
+ 8. WRITEBACK page is being synced to disk
+
+ [LRU related page flags]
+ 5. LRU page is in one of the LRU lists
+ 6. ACTIVE page is in the active LRU list
+18. UNEVICTABLE page is in the unevictable (non-)LRU list
+ It is somehow pinned and not a candidate for LRU page reclaims,
+ eg. ramfs pages, shmctl(SHM_LOCK) and mlock() memory segments
+ 2. REFERENCED page has been referenced since last LRU list enqueue/requeue
+ 9. RECLAIM page will be reclaimed soon after its pageout IO completed
+11. MMAP a memory mapped page
+12. ANON a memory mapped page that is not part of a file
+13. SWAPCACHE page is mapped to swap space, ie. has an associated swap entry
+14. SWAPBACKED page is backed by swap/RAM
+
+The page-types tool in this directory can be used to query the above flags.
Using pagemap to do something useful:
diff --git a/Documentation/watchdog/hpwdt.txt b/Documentation/watchdog/hpwdt.txt
new file mode 100644
index 00000000000..9c24d5ffbb0
--- /dev/null
+++ b/Documentation/watchdog/hpwdt.txt
@@ -0,0 +1,95 @@
+Last reviewed: 06/02/2009
+
+ HP iLO2 NMI Watchdog Driver
+ NMI sourcing for iLO2 based ProLiant Servers
+ Documentation and Driver by
+ Thomas Mingarelli <thomas.mingarelli@hp.com>
+
+ The HP iLO2 NMI Watchdog driver is a kernel module that provides basic
+ watchdog functionality and the added benefit of NMI sourcing. Both the
+ watchdog functionality and the NMI sourcing capability need to be enabled
+ by the user. Remember that the two modes are not dependant on one another.
+ A user can have the NMI sourcing without the watchdog timer and vice-versa.
+
+ Watchdog functionality is enabled like any other common watchdog driver. That
+ is, an application needs to be started that kicks off the watchdog timer. A
+ basic application exists in the Documentation/watchdog/src directory called
+ watchdog-test.c. Simply compile the C file and kick it off. If the system
+ gets into a bad state and hangs, the HP ProLiant iLO 2 timer register will
+ not be updated in a timely fashion and a hardware system reset (also known as
+ an Automatic Server Recovery (ASR)) event will occur.
+
+ The hpwdt driver also has four (4) module parameters. They are the following:
+
+ soft_margin - allows the user to set the watchdog timer value
+ allow_kdump - allows the user to save off a kernel dump image after an NMI
+ nowayout - basic watchdog parameter that does not allow the timer to
+ be restarted or an impending ASR to be escaped.
+ priority - determines whether or not the hpwdt driver is first on the
+ die_notify list to handle NMIs or last. The default value
+ for this module parameter is 0 or LAST. If the user wants to
+ enable NMI sourcing then reload the hpwdt driver with
+ priority=1 (and boot with nmi_watchdog=0).
+
+ NOTE: More information about watchdog drivers in general, including the ioctl
+ interface to /dev/watchdog can be found in
+ Documentation/watchdog/watchdog-api.txt and Documentation/IPMI.txt.
+
+ The priority parameter was introduced due to other kernel software that relied
+ on handling NMIs (like oprofile). Keeping hpwdt's priority at 0 (or LAST)
+ enables the users of NMIs for non critical events to be work as expected.
+
+ The NMI sourcing capability is disabled by default due to the inability to
+ distinguish between "NMI Watchdog Ticks" and "HW generated NMI events" in the
+ Linux kernel. What this means is that the hpwdt nmi handler code is called
+ each time the NMI signal fires off. This could amount to several thousands of
+ NMIs in a matter of seconds. If a user sees the Linux kernel's "dazed and
+ confused" message in the logs or if the system gets into a hung state, then
+ the hpwdt driver can be reloaded with the "priority" module parameter set
+ (priority=1).
+
+ 1. If the kernel has not been booted with nmi_watchdog turned off then
+ edit /boot/grub/menu.lst and place the nmi_watchdog=0 at the end of the
+ currently booting kernel line.
+ 2. reboot the sever
+ 3. Once the system comes up perform a rmmod hpwdt
+ 4. insmod /lib/modules/`uname -r`/kernel/drivers/char/watchdog/hpwdt.ko priority=1
+
+ Now, the hpwdt can successfully receive and source the NMI and provide a log
+ message that details the reason for the NMI (as determined by the HP BIOS).
+
+ Below is a list of NMIs the HP BIOS understands along with the associated
+ code (reason):
+
+ No source found 00h
+
+ Uncorrectable Memory Error 01h
+
+ ASR NMI 1Bh
+
+ PCI Parity Error 20h
+
+ NMI Button Press 27h
+
+ SB_BUS_NMI 28h
+
+ ILO Doorbell NMI 29h
+
+ ILO IOP NMI 2Ah
+
+ ILO Watchdog NMI 2Bh
+
+ Proc Throt NMI 2Ch
+
+ Front Side Bus NMI 2Dh
+
+ PCI Express Error 2Fh
+
+ DMA controller NMI 30h
+
+ Hypertransport/CSI Error 31h
+
+
+
+ -- Tom Mingarelli
+ (thomas.mingarelli@hp.com)
diff --git a/Documentation/x86/00-INDEX b/Documentation/x86/00-INDEX
index dbe3377754a..f37b46d3486 100644
--- a/Documentation/x86/00-INDEX
+++ b/Documentation/x86/00-INDEX
@@ -2,3 +2,5 @@
- this file
mtrr.txt
- how to use x86 Memory Type Range Registers to increase performance
+exception-tables.txt
+ - why and how Linux kernel uses exception tables on x86
diff --git a/Documentation/x86/boot.txt b/Documentation/x86/boot.txt
index e0203662f9e..8da3a795083 100644
--- a/Documentation/x86/boot.txt
+++ b/Documentation/x86/boot.txt
@@ -50,6 +50,10 @@ Protocol 2.08: (Kernel 2.6.26) Added crc32 checksum and ELF format
Protocol 2.09: (Kernel 2.6.26) Added a field of 64-bit physical
pointer to single linked list of struct setup_data.
+Protocol 2.10: (Kernel 2.6.31) Added a protocol for relaxed alignment
+ beyond the kernel_alignment added, new init_size and
+ pref_address fields. Added extended boot loader IDs.
+
**** MEMORY LAYOUT
The traditional memory map for the kernel loader, used for Image or
@@ -168,12 +172,13 @@ Offset Proto Name Meaning
021C/4 2.00+ ramdisk_size initrd size (set by boot loader)
0220/4 2.00+ bootsect_kludge DO NOT USE - for bootsect.S use only
0224/2 2.01+ heap_end_ptr Free memory after setup end
-0226/2 N/A pad1 Unused
+0226/1 2.02+(3 ext_loader_ver Extended boot loader version
+0227/1 2.02+(3 ext_loader_type Extended boot loader ID
0228/4 2.02+ cmd_line_ptr 32-bit pointer to the kernel command line
022C/4 2.03+ ramdisk_max Highest legal initrd address
0230/4 2.05+ kernel_alignment Physical addr alignment required for kernel
0234/1 2.05+ relocatable_kernel Whether kernel is relocatable or not
-0235/1 N/A pad2 Unused
+0235/1 2.10+ min_alignment Minimum alignment, as a power of two
0236/2 N/A pad3 Unused
0238/4 2.06+ cmdline_size Maximum size of the kernel command line
023C/4 2.07+ hardware_subarch Hardware subarchitecture
@@ -182,6 +187,8 @@ Offset Proto Name Meaning
024C/4 2.08+ payload_length Length of kernel payload
0250/8 2.09+ setup_data 64-bit physical pointer to linked list
of struct setup_data
+0258/8 2.10+ pref_address Preferred loading address
+0260/4 2.10+ init_size Linear memory required during initialization
(1) For backwards compatibility, if the setup_sects field contains 0, the
real value is 4.
@@ -190,6 +197,8 @@ Offset Proto Name Meaning
field are unusable, which means the size of a bzImage kernel
cannot be determined.
+(3) Ignored, but safe to set, for boot protocols 2.02-2.09.
+
If the "HdrS" (0x53726448) magic number is not found at offset 0x202,
the boot protocol version is "old". Loading an old kernel, the
following parameters should be assumed:
@@ -343,18 +352,32 @@ Protocol: 2.00+
0xTV here, where T is an identifier for the boot loader and V is
a version number. Otherwise, enter 0xFF here.
+ For boot loader IDs above T = 0xD, write T = 0xE to this field and
+ write the extended ID minus 0x10 to the ext_loader_type field.
+ Similarly, the ext_loader_ver field can be used to provide more than
+ four bits for the bootloader version.
+
+ For example, for T = 0x15, V = 0x234, write:
+
+ type_of_loader <- 0xE4
+ ext_loader_type <- 0x05
+ ext_loader_ver <- 0x23
+
Assigned boot loader ids:
0 LILO (0x00 reserved for pre-2.00 bootloader)
1 Loadlin
2 bootsect-loader (0x20, all other values reserved)
- 3 SYSLINUX
- 4 EtherBoot
+ 3 Syslinux
+ 4 Etherboot/gPXE
5 ELILO
7 GRUB
- 8 U-BOOT
+ 8 U-Boot
9 Xen
A Gujin
B Qemu
+ C Arcturus Networks uCbootloader
+ E Extended (see ext_loader_type)
+ F Special (0xFF = undefined)
Please contact <hpa@zytor.com> if you need a bootloader ID
value assigned.
@@ -453,6 +476,35 @@ Protocol: 2.01+
Set this field to the offset (from the beginning of the real-mode
code) of the end of the setup stack/heap, minus 0x0200.
+Field name: ext_loader_ver
+Type: write (optional)
+Offset/size: 0x226/1
+Protocol: 2.02+
+
+ This field is used as an extension of the version number in the
+ type_of_loader field. The total version number is considered to be
+ (type_of_loader & 0x0f) + (ext_loader_ver << 4).
+
+ The use of this field is boot loader specific. If not written, it
+ is zero.
+
+ Kernels prior to 2.6.31 did not recognize this field, but it is safe
+ to write for protocol version 2.02 or higher.
+
+Field name: ext_loader_type
+Type: write (obligatory if (type_of_loader & 0xf0) == 0xe0)
+Offset/size: 0x227/1
+Protocol: 2.02+
+
+ This field is used as an extension of the type number in
+ type_of_loader field. If the type in type_of_loader is 0xE, then
+ the actual type is (ext_loader_type + 0x10).
+
+ This field is ignored if the type in type_of_loader is not 0xE.
+
+ Kernels prior to 2.6.31 did not recognize this field, but it is safe
+ to write for protocol version 2.02 or higher.
+
Field name: cmd_line_ptr
Type: write (obligatory)
Offset/size: 0x228/4
@@ -482,11 +534,19 @@ Protocol: 2.03+
0x37FFFFFF, you can start your ramdisk at 0x37FE0000.)
Field name: kernel_alignment
-Type: read (reloc)
+Type: read/modify (reloc)
Offset/size: 0x230/4
-Protocol: 2.05+
+Protocol: 2.05+ (read), 2.10+ (modify)
+
+ Alignment unit required by the kernel (if relocatable_kernel is
+ true.) A relocatable kernel that is loaded at an alignment
+ incompatible with the value in this field will be realigned during
+ kernel initialization.
- Alignment unit required by the kernel (if relocatable_kernel is true.)
+ Starting with protocol version 2.10, this reflects the kernel
+ alignment preferred for optimal performance; it is possible for the
+ loader to modify this field to permit a lesser alignment. See the
+ min_alignment and pref_address field below.
Field name: relocatable_kernel
Type: read (reloc)
@@ -498,6 +558,22 @@ Protocol: 2.05+
After loading, the boot loader must set the code32_start field to
point to the loaded code, or to a boot loader hook.
+Field name: min_alignment
+Type: read (reloc)
+Offset/size: 0x235/1
+Protocol: 2.10+
+
+ This field, if nonzero, indicates as a power of two the minimum
+ alignment required, as opposed to preferred, by the kernel to boot.
+ If a boot loader makes use of this field, it should update the
+ kernel_alignment field with the alignment unit desired; typically:
+
+ kernel_alignment = 1 << min_alignment
+
+ There may be a considerable performance cost with an excessively
+ misaligned kernel. Therefore, a loader should typically try each
+ power-of-two alignment from kernel_alignment down to this alignment.
+
Field name: cmdline_size
Type: read
Offset/size: 0x238/4
@@ -582,6 +658,36 @@ Protocol: 2.09+
sure to consider the case where the linked list already contains
entries.
+Field name: pref_address
+Type: read (reloc)
+Offset/size: 0x258/8
+Protocol: 2.10+
+
+ This field, if nonzero, represents a preferred load address for the
+ kernel. A relocating bootloader should attempt to load at this
+ address if possible.
+
+ A non-relocatable kernel will unconditionally move itself and to run
+ at this address.
+
+Field name: init_size
+Type: read
+Offset/size: 0x25c/4
+
+ This field indicates the amount of linear contiguous memory starting
+ at the kernel runtime start address that the kernel needs before it
+ is capable of examining its memory map. This is not the same thing
+ as the total amount of memory the kernel needs to boot, but it can
+ be used by a relocating boot loader to help select a safe load
+ address for the kernel.
+
+ The kernel runtime start address is determined by the following algorithm:
+
+ if (relocatable_kernel)
+ runtime_start = align_up(load_address, kernel_alignment)
+ else
+ runtime_start = pref_address
+
**** THE IMAGE CHECKSUM
diff --git a/Documentation/exception.txt b/Documentation/x86/exception-tables.txt
index 2d5aded6424..32901aa36f0 100644
--- a/Documentation/exception.txt
+++ b/Documentation/x86/exception-tables.txt
@@ -1,123 +1,123 @@
- Kernel level exception handling in Linux 2.1.8
+ Kernel level exception handling in Linux
Commentary by Joerg Pommnitz <joerg@raleigh.ibm.com>
-When a process runs in kernel mode, it often has to access user
-mode memory whose address has been passed by an untrusted program.
+When a process runs in kernel mode, it often has to access user
+mode memory whose address has been passed by an untrusted program.
To protect itself the kernel has to verify this address.
-In older versions of Linux this was done with the
-int verify_area(int type, const void * addr, unsigned long size)
+In older versions of Linux this was done with the
+int verify_area(int type, const void * addr, unsigned long size)
function (which has since been replaced by access_ok()).
-This function verified that the memory area starting at address
+This function verified that the memory area starting at address
'addr' and of size 'size' was accessible for the operation specified
-in type (read or write). To do this, verify_read had to look up the
-virtual memory area (vma) that contained the address addr. In the
-normal case (correctly working program), this test was successful.
+in type (read or write). To do this, verify_read had to look up the
+virtual memory area (vma) that contained the address addr. In the
+normal case (correctly working program), this test was successful.
It only failed for a few buggy programs. In some kernel profiling
tests, this normally unneeded verification used up a considerable
amount of time.
-To overcome this situation, Linus decided to let the virtual memory
+To overcome this situation, Linus decided to let the virtual memory
hardware present in every Linux-capable CPU handle this test.
How does this work?
-Whenever the kernel tries to access an address that is currently not
-accessible, the CPU generates a page fault exception and calls the
-page fault handler
+Whenever the kernel tries to access an address that is currently not
+accessible, the CPU generates a page fault exception and calls the
+page fault handler
void do_page_fault(struct pt_regs *regs, unsigned long error_code)
-in arch/i386/mm/fault.c. The parameters on the stack are set up by
-the low level assembly glue in arch/i386/kernel/entry.S. The parameter
-regs is a pointer to the saved registers on the stack, error_code
+in arch/x86/mm/fault.c. The parameters on the stack are set up by
+the low level assembly glue in arch/x86/kernel/entry_32.S. The parameter
+regs is a pointer to the saved registers on the stack, error_code
contains a reason code for the exception.
-do_page_fault first obtains the unaccessible address from the CPU
-control register CR2. If the address is within the virtual address
-space of the process, the fault probably occurred, because the page
-was not swapped in, write protected or something similar. However,
-we are interested in the other case: the address is not valid, there
-is no vma that contains this address. In this case, the kernel jumps
-to the bad_area label.
-
-There it uses the address of the instruction that caused the exception
-(i.e. regs->eip) to find an address where the execution can continue
-(fixup). If this search is successful, the fault handler modifies the
-return address (again regs->eip) and returns. The execution will
+do_page_fault first obtains the unaccessible address from the CPU
+control register CR2. If the address is within the virtual address
+space of the process, the fault probably occurred, because the page
+was not swapped in, write protected or something similar. However,
+we are interested in the other case: the address is not valid, there
+is no vma that contains this address. In this case, the kernel jumps
+to the bad_area label.
+
+There it uses the address of the instruction that caused the exception
+(i.e. regs->eip) to find an address where the execution can continue
+(fixup). If this search is successful, the fault handler modifies the
+return address (again regs->eip) and returns. The execution will
continue at the address in fixup.
Where does fixup point to?
-Since we jump to the contents of fixup, fixup obviously points
-to executable code. This code is hidden inside the user access macros.
-I have picked the get_user macro defined in include/asm/uaccess.h as an
-example. The definition is somewhat hard to follow, so let's peek at
+Since we jump to the contents of fixup, fixup obviously points
+to executable code. This code is hidden inside the user access macros.
+I have picked the get_user macro defined in arch/x86/include/asm/uaccess.h
+as an example. The definition is somewhat hard to follow, so let's peek at
the code generated by the preprocessor and the compiler. I selected
-the get_user call in drivers/char/console.c for a detailed examination.
+the get_user call in drivers/char/sysrq.c for a detailed examination.
-The original code in console.c line 1405:
+The original code in sysrq.c line 587:
get_user(c, buf);
The preprocessor output (edited to become somewhat readable):
(
- {
- long __gu_err = - 14 , __gu_val = 0;
- const __typeof__(*( ( buf ) )) *__gu_addr = ((buf));
- if (((((0 + current_set[0])->tss.segment) == 0x18 ) ||
- (((sizeof(*(buf))) <= 0xC0000000UL) &&
- ((unsigned long)(__gu_addr ) <= 0xC0000000UL - (sizeof(*(buf)))))))
+ {
+ long __gu_err = - 14 , __gu_val = 0;
+ const __typeof__(*( ( buf ) )) *__gu_addr = ((buf));
+ if (((((0 + current_set[0])->tss.segment) == 0x18 ) ||
+ (((sizeof(*(buf))) <= 0xC0000000UL) &&
+ ((unsigned long)(__gu_addr ) <= 0xC0000000UL - (sizeof(*(buf)))))))
do {
- __gu_err = 0;
- switch ((sizeof(*(buf)))) {
- case 1:
- __asm__ __volatile__(
- "1: mov" "b" " %2,%" "b" "1\n"
- "2:\n"
- ".section .fixup,\"ax\"\n"
- "3: movl %3,%0\n"
- " xor" "b" " %" "b" "1,%" "b" "1\n"
- " jmp 2b\n"
- ".section __ex_table,\"a\"\n"
- " .align 4\n"
- " .long 1b,3b\n"
+ __gu_err = 0;
+ switch ((sizeof(*(buf)))) {
+ case 1:
+ __asm__ __volatile__(
+ "1: mov" "b" " %2,%" "b" "1\n"
+ "2:\n"
+ ".section .fixup,\"ax\"\n"
+ "3: movl %3,%0\n"
+ " xor" "b" " %" "b" "1,%" "b" "1\n"
+ " jmp 2b\n"
+ ".section __ex_table,\"a\"\n"
+ " .align 4\n"
+ " .long 1b,3b\n"
".text" : "=r"(__gu_err), "=q" (__gu_val): "m"((*(struct __large_struct *)
- ( __gu_addr )) ), "i"(- 14 ), "0"( __gu_err )) ;
- break;
- case 2:
+ ( __gu_addr )) ), "i"(- 14 ), "0"( __gu_err )) ;
+ break;
+ case 2:
__asm__ __volatile__(
- "1: mov" "w" " %2,%" "w" "1\n"
- "2:\n"
- ".section .fixup,\"ax\"\n"
- "3: movl %3,%0\n"
- " xor" "w" " %" "w" "1,%" "w" "1\n"
- " jmp 2b\n"
- ".section __ex_table,\"a\"\n"
- " .align 4\n"
- " .long 1b,3b\n"
+ "1: mov" "w" " %2,%" "w" "1\n"
+ "2:\n"
+ ".section .fixup,\"ax\"\n"
+ "3: movl %3,%0\n"
+ " xor" "w" " %" "w" "1,%" "w" "1\n"
+ " jmp 2b\n"
+ ".section __ex_table,\"a\"\n"
+ " .align 4\n"
+ " .long 1b,3b\n"
".text" : "=r"(__gu_err), "=r" (__gu_val) : "m"((*(struct __large_struct *)
- ( __gu_addr )) ), "i"(- 14 ), "0"( __gu_err ));
- break;
- case 4:
- __asm__ __volatile__(
- "1: mov" "l" " %2,%" "" "1\n"
- "2:\n"
- ".section .fixup,\"ax\"\n"
- "3: movl %3,%0\n"
- " xor" "l" " %" "" "1,%" "" "1\n"
- " jmp 2b\n"
- ".section __ex_table,\"a\"\n"
- " .align 4\n" " .long 1b,3b\n"
+ ( __gu_addr )) ), "i"(- 14 ), "0"( __gu_err ));
+ break;
+ case 4:
+ __asm__ __volatile__(
+ "1: mov" "l" " %2,%" "" "1\n"
+ "2:\n"
+ ".section .fixup,\"ax\"\n"
+ "3: movl %3,%0\n"
+ " xor" "l" " %" "" "1,%" "" "1\n"
+ " jmp 2b\n"
+ ".section __ex_table,\"a\"\n"
+ " .align 4\n" " .long 1b,3b\n"
".text" : "=r"(__gu_err), "=r" (__gu_val) : "m"((*(struct __large_struct *)
- ( __gu_addr )) ), "i"(- 14 ), "0"(__gu_err));
- break;
- default:
- (__gu_val) = __get_user_bad();
- }
- } while (0) ;
- ((c)) = (__typeof__(*((buf))))__gu_val;
+ ( __gu_addr )) ), "i"(- 14 ), "0"(__gu_err));
+ break;
+ default:
+ (__gu_val) = __get_user_bad();
+ }
+ } while (0) ;
+ ((c)) = (__typeof__(*((buf))))__gu_val;
__gu_err;
}
);
@@ -127,12 +127,12 @@ see what code gcc generates:
> xorl %edx,%edx
> movl current_set,%eax
- > cmpl $24,788(%eax)
- > je .L1424
+ > cmpl $24,788(%eax)
+ > je .L1424
> cmpl $-1073741825,64(%esp)
- > ja .L1423
+ > ja .L1423
> .L1424:
- > movl %edx,%eax
+ > movl %edx,%eax
> movl 64(%esp),%ebx
> #APP
> 1: movb (%ebx),%dl /* this is the actual user access */
@@ -149,17 +149,17 @@ see what code gcc generates:
> .L1423:
> movzbl %dl,%esi
-The optimizer does a good job and gives us something we can actually
-understand. Can we? The actual user access is quite obvious. Thanks
-to the unified address space we can just access the address in user
+The optimizer does a good job and gives us something we can actually
+understand. Can we? The actual user access is quite obvious. Thanks
+to the unified address space we can just access the address in user
memory. But what does the .section stuff do?????
To understand this we have to look at the final kernel:
> objdump --section-headers vmlinux
- >
+ >
> vmlinux: file format elf32-i386
- >
+ >
> Sections:
> Idx Name Size VMA LMA File off Algn
> 0 .text 00098f40 c0100000 c0100000 00001000 2**4
@@ -198,18 +198,18 @@ final kernel executable:
The whole user memory access is reduced to 10 x86 machine instructions.
The instructions bracketed in the .section directives are no longer
-in the normal execution path. They are located in a different section
+in the normal execution path. They are located in a different section
of the executable file:
> objdump --disassemble --section=.fixup vmlinux
- >
+ >
> c0199ff5 <.fixup+10b5> movl $0xfffffff2,%eax
> c0199ffa <.fixup+10ba> xorb %dl,%dl
> c0199ffc <.fixup+10bc> jmp c017e7a7 <do_con_write+e3>
And finally:
> objdump --full-contents --section=__ex_table vmlinux
- >
+ >
> c01aa7c4 93c017c0 e09f19c0 97c017c0 99c017c0 ................
> c01aa7d4 f6c217c0 e99f19c0 a5e717c0 f59f19c0 ................
> c01aa7e4 080a18c0 01a019c0 0a0a18c0 04a019c0 ................
@@ -235,8 +235,8 @@ sections in the ELF object file. So the instructions
ended up in the .fixup section of the object file and the addresses
.long 1b,3b
ended up in the __ex_table section of the object file. 1b and 3b
-are local labels. The local label 1b (1b stands for next label 1
-backward) is the address of the instruction that might fault, i.e.
+are local labels. The local label 1b (1b stands for next label 1
+backward) is the address of the instruction that might fault, i.e.
in our case the address of the label 1 is c017e7a5:
the original assembly code: > 1: movb (%ebx),%dl
and linked in vmlinux : > c017e7a5 <do_con_write+e1> movb (%ebx),%dl
@@ -254,7 +254,7 @@ The assembly code
becomes the value pair
> c01aa7d4 c017c2f6 c0199fe9 c017e7a5 c0199ff5 ................
^this is ^this is
- 1b 3b
+ 1b 3b
c017e7a5,c0199ff5 in the exception table of the kernel.
So, what actually happens if a fault from kernel mode with no suitable
@@ -266,9 +266,9 @@ vma occurs?
3.) CPU calls do_page_fault
4.) do page fault calls search_exception_table (regs->eip == c017e7a5);
5.) search_exception_table looks up the address c017e7a5 in the
- exception table (i.e. the contents of the ELF section __ex_table)
+ exception table (i.e. the contents of the ELF section __ex_table)
and returns the address of the associated fault handle code c0199ff5.
-6.) do_page_fault modifies its own return address to point to the fault
+6.) do_page_fault modifies its own return address to point to the fault
handle code and returns.
7.) execution continues in the fault handling code.
8.) 8a) EAX becomes -EFAULT (== -14)
diff --git a/Documentation/x86/x86_64/boot-options.txt b/Documentation/x86/x86_64/boot-options.txt
index 34c13040a71..29a6ff8bc7d 100644
--- a/Documentation/x86/x86_64/boot-options.txt
+++ b/Documentation/x86/x86_64/boot-options.txt
@@ -5,21 +5,51 @@ only the AMD64 specific ones are listed here.
Machine check
- mce=off disable machine check
- mce=bootlog Enable logging of machine checks left over from booting.
- Disabled by default on AMD because some BIOS leave bogus ones.
- If your BIOS doesn't do that it's a good idea to enable though
- to make sure you log even machine check events that result
- in a reboot. On Intel systems it is enabled by default.
+ Please see Documentation/x86/x86_64/machinecheck for sysfs runtime tunables.
+
+ mce=off
+ Disable machine check
+ mce=no_cmci
+ Disable CMCI(Corrected Machine Check Interrupt) that
+ Intel processor supports. Usually this disablement is
+ not recommended, but it might be handy if your hardware
+ is misbehaving.
+ Note that you'll get more problems without CMCI than with
+ due to the shared banks, i.e. you might get duplicated
+ error logs.
+ mce=dont_log_ce
+ Don't make logs for corrected errors. All events reported
+ as corrected are silently cleared by OS.
+ This option will be useful if you have no interest in any
+ of corrected errors.
+ mce=ignore_ce
+ Disable features for corrected errors, e.g. polling timer
+ and CMCI. All events reported as corrected are not cleared
+ by OS and remained in its error banks.
+ Usually this disablement is not recommended, however if
+ there is an agent checking/clearing corrected errors
+ (e.g. BIOS or hardware monitoring applications), conflicting
+ with OS's error handling, and you cannot deactivate the agent,
+ then this option will be a help.
+ mce=bootlog
+ Enable logging of machine checks left over from booting.
+ Disabled by default on AMD because some BIOS leave bogus ones.
+ If your BIOS doesn't do that it's a good idea to enable though
+ to make sure you log even machine check events that result
+ in a reboot. On Intel systems it is enabled by default.
mce=nobootlog
Disable boot machine check logging.
- mce=tolerancelevel (number)
+ mce=tolerancelevel[,monarchtimeout] (number,number)
+ tolerance levels:
0: always panic on uncorrected errors, log corrected errors
1: panic or SIGBUS on uncorrected errors, log corrected errors
2: SIGBUS or log uncorrected errors, log corrected errors
3: never panic or SIGBUS, log all errors (for testing only)
Default is 1
Can be also set using sysfs which is preferable.
+ monarchtimeout:
+ Sets the time in us to wait for other CPUs on machine checks. 0
+ to disable.
nomce (for compatibility with i386): same as mce=off
@@ -150,11 +180,6 @@ NUMA
Otherwise, the remaining system RAM is allocated to an
additional node.
- numa=hotadd=percent
- Only allow hotadd memory to preallocate page structures upto
- percent of already available memory.
- numa=hotadd=0 will disable hotadd memory.
-
ACPI
acpi=off Don't enable ACPI
diff --git a/Documentation/x86/x86_64/machinecheck b/Documentation/x86/x86_64/machinecheck
index a05e58e7b15..b1fb3027328 100644
--- a/Documentation/x86/x86_64/machinecheck
+++ b/Documentation/x86/x86_64/machinecheck
@@ -41,7 +41,9 @@ check_interval
the polling interval. When the poller stops finding MCEs, it
triggers an exponential backoff (poll less often) on the polling
interval. The check_interval variable is both the initial and
- maximum polling interval.
+ maximum polling interval. 0 means no polling for corrected machine
+ check errors (but some corrected errors might be still reported
+ in other ways)
tolerant
Tolerance level. When a machine check exception occurs for a non
@@ -67,6 +69,10 @@ trigger
Program to run when a machine check event is detected.
This is an alternative to running mcelog regularly from cron
and allows to detect events faster.
+monarch_timeout
+ How long to wait for the other CPUs to machine check too on a
+ exception. 0 to disable waiting for other CPUs.
+ Unit: us
TBD document entries for AMD threshold interrupt configuration
diff --git a/Documentation/x86/x86_64/mm.txt b/Documentation/x86/x86_64/mm.txt
index 29b52b14d0b..d6498e3cd71 100644
--- a/Documentation/x86/x86_64/mm.txt
+++ b/Documentation/x86/x86_64/mm.txt
@@ -6,10 +6,11 @@ Virtual memory map with 4 level page tables:
0000000000000000 - 00007fffffffffff (=47 bits) user space, different per mm
hole caused by [48:63] sign extension
ffff800000000000 - ffff80ffffffffff (=40 bits) guard hole
-ffff880000000000 - ffffc0ffffffffff (=57 TB) direct mapping of all phys. memory
-ffffc10000000000 - ffffc1ffffffffff (=40 bits) hole
-ffffc20000000000 - ffffe1ffffffffff (=45 bits) vmalloc/ioremap space
-ffffe20000000000 - ffffe2ffffffffff (=40 bits) virtual memory map (1TB)
+ffff880000000000 - ffffc7ffffffffff (=64 TB) direct mapping of all phys. memory
+ffffc80000000000 - ffffc8ffffffffff (=40 bits) hole
+ffffc90000000000 - ffffe8ffffffffff (=45 bits) vmalloc/ioremap space
+ffffe90000000000 - ffffe9ffffffffff (=40 bits) hole
+ffffea0000000000 - ffffeaffffffffff (=40 bits) virtual memory map (1TB)
... unused hole ...
ffffffff80000000 - ffffffffa0000000 (=512 MB) kernel text mapping, from phys 0
ffffffffa0000000 - fffffffffff00000 (=1536 MB) module mapping space