aboutsummaryrefslogtreecommitdiff
path: root/Documentation
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation')
-rw-r--r--Documentation/Changes10
-rw-r--r--Documentation/DocBook/libata.tmpl356
-rw-r--r--Documentation/SubmittingPatches70
-rw-r--r--Documentation/keys.txt74
4 files changed, 490 insertions, 20 deletions
diff --git a/Documentation/Changes b/Documentation/Changes
index 5eaab0441d7..27232be26e1 100644
--- a/Documentation/Changes
+++ b/Documentation/Changes
@@ -237,6 +237,12 @@ udev
udev is a userspace application for populating /dev dynamically with
only entries for devices actually present. udev replaces devfs.
+FUSE
+----
+
+Needs libfuse 2.4.0 or later. Absolute minimum is 2.3.0 but mount
+options 'direct_io' and 'kernel_cache' won't work.
+
Networking
==========
@@ -390,6 +396,10 @@ udev
----
o <http://www.kernel.org/pub/linux/utils/kernel/hotplug/udev.html>
+FUSE
+----
+o <http://sourceforge.net/projects/fuse>
+
Networking
**********
diff --git a/Documentation/DocBook/libata.tmpl b/Documentation/DocBook/libata.tmpl
index 375ae760dc1..b2ec780bcda 100644
--- a/Documentation/DocBook/libata.tmpl
+++ b/Documentation/DocBook/libata.tmpl
@@ -415,6 +415,362 @@ and other resources, etc.
</sect1>
</chapter>
+ <chapter id="libataEH">
+ <title>Error handling</title>
+
+ <para>
+ This chapter describes how errors are handled under libata.
+ Readers are advised to read SCSI EH
+ (Documentation/scsi/scsi_eh.txt) and ATA exceptions doc first.
+ </para>
+
+ <sect1><title>Origins of commands</title>
+ <para>
+ In libata, a command is represented with struct ata_queued_cmd
+ or qc. qc's are preallocated during port initialization and
+ repetitively used for command executions. Currently only one
+ qc is allocated per port but yet-to-be-merged NCQ branch
+ allocates one for each tag and maps each qc to NCQ tag 1-to-1.
+ </para>
+ <para>
+ libata commands can originate from two sources - libata itself
+ and SCSI midlayer. libata internal commands are used for
+ initialization and error handling. All normal blk requests
+ and commands for SCSI emulation are passed as SCSI commands
+ through queuecommand callback of SCSI host template.
+ </para>
+ </sect1>
+
+ <sect1><title>How commands are issued</title>
+
+ <variablelist>
+
+ <varlistentry><term>Internal commands</term>
+ <listitem>
+ <para>
+ First, qc is allocated and initialized using
+ ata_qc_new_init(). Although ata_qc_new_init() doesn't
+ implement any wait or retry mechanism when qc is not
+ available, internal commands are currently issued only during
+ initialization and error recovery, so no other command is
+ active and allocation is guaranteed to succeed.
+ </para>
+ <para>
+ Once allocated qc's taskfile is initialized for the command to
+ be executed. qc currently has two mechanisms to notify
+ completion. One is via qc->complete_fn() callback and the
+ other is completion qc->waiting. qc->complete_fn() callback
+ is the asynchronous path used by normal SCSI translated
+ commands and qc->waiting is the synchronous (issuer sleeps in
+ process context) path used by internal commands.
+ </para>
+ <para>
+ Once initialization is complete, host_set lock is acquired
+ and the qc is issued.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry><term>SCSI commands</term>
+ <listitem>
+ <para>
+ All libata drivers use ata_scsi_queuecmd() as
+ hostt->queuecommand callback. scmds can either be simulated
+ or translated. No qc is involved in processing a simulated
+ scmd. The result is computed right away and the scmd is
+ completed.
+ </para>
+ <para>
+ For a translated scmd, ata_qc_new_init() is invoked to
+ allocate a qc and the scmd is translated into the qc. SCSI
+ midlayer's completion notification function pointer is stored
+ into qc->scsidone.
+ </para>
+ <para>
+ qc->complete_fn() callback is used for completion
+ notification. ATA commands use ata_scsi_qc_complete() while
+ ATAPI commands use atapi_qc_complete(). Both functions end up
+ calling qc->scsidone to notify upper layer when the qc is
+ finished. After translation is completed, the qc is issued
+ with ata_qc_issue().
+ </para>
+ <para>
+ Note that SCSI midlayer invokes hostt->queuecommand while
+ holding host_set lock, so all above occur while holding
+ host_set lock.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ </variablelist>
+ </sect1>
+
+ <sect1><title>How commands are processed</title>
+ <para>
+ Depending on which protocol and which controller are used,
+ commands are processed differently. For the purpose of
+ discussion, a controller which uses taskfile interface and all
+ standard callbacks is assumed.
+ </para>
+ <para>
+ Currently 6 ATA command protocols are used. They can be
+ sorted into the following four categories according to how
+ they are processed.
+ </para>
+
+ <variablelist>
+ <varlistentry><term>ATA NO DATA or DMA</term>
+ <listitem>
+ <para>
+ ATA_PROT_NODATA and ATA_PROT_DMA fall into this category.
+ These types of commands don't require any software
+ intervention once issued. Device will raise interrupt on
+ completion.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry><term>ATA PIO</term>
+ <listitem>
+ <para>
+ ATA_PROT_PIO is in this category. libata currently
+ implements PIO with polling. ATA_NIEN bit is set to turn
+ off interrupt and pio_task on ata_wq performs polling and
+ IO.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry><term>ATAPI NODATA or DMA</term>
+ <listitem>
+ <para>
+ ATA_PROT_ATAPI_NODATA and ATA_PROT_ATAPI_DMA are in this
+ category. packet_task is used to poll BSY bit after
+ issuing PACKET command. Once BSY is turned off by the
+ device, packet_task transfers CDB and hands off processing
+ to interrupt handler.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry><term>ATAPI PIO</term>
+ <listitem>
+ <para>
+ ATA_PROT_ATAPI is in this category. ATA_NIEN bit is set
+ and, as in ATAPI NODATA or DMA, packet_task submits cdb.
+ However, after submitting cdb, further processing (data
+ transfer) is handed off to pio_task.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ </sect1>
+
+ <sect1><title>How commands are completed</title>
+ <para>
+ Once issued, all qc's are either completed with
+ ata_qc_complete() or time out. For commands which are handled
+ by interrupts, ata_host_intr() invokes ata_qc_complete(), and,
+ for PIO tasks, pio_task invokes ata_qc_complete(). In error
+ cases, packet_task may also complete commands.
+ </para>
+ <para>
+ ata_qc_complete() does the following.
+ </para>
+
+ <orderedlist>
+
+ <listitem>
+ <para>
+ DMA memory is unmapped.
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ ATA_QCFLAG_ACTIVE is clared from qc->flags.
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ qc->complete_fn() callback is invoked. If the return value of
+ the callback is not zero. Completion is short circuited and
+ ata_qc_complete() returns.
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ __ata_qc_complete() is called, which does
+ <orderedlist>
+
+ <listitem>
+ <para>
+ qc->flags is cleared to zero.
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ ap->active_tag and qc->tag are poisoned.
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ qc->waiting is claread &amp; completed (in that order).
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ qc is deallocated by clearing appropriate bit in ap->qactive.
+ </para>
+ </listitem>
+
+ </orderedlist>
+ </para>
+ </listitem>
+
+ </orderedlist>
+
+ <para>
+ So, it basically notifies upper layer and deallocates qc. One
+ exception is short-circuit path in #3 which is used by
+ atapi_qc_complete().
+ </para>
+ <para>
+ For all non-ATAPI commands, whether it fails or not, almost
+ the same code path is taken and very little error handling
+ takes place. A qc is completed with success status if it
+ succeeded, with failed status otherwise.
+ </para>
+ <para>
+ However, failed ATAPI commands require more handling as
+ REQUEST SENSE is needed to acquire sense data. If an ATAPI
+ command fails, ata_qc_complete() is invoked with error status,
+ which in turn invokes atapi_qc_complete() via
+ qc->complete_fn() callback.
+ </para>
+ <para>
+ This makes atapi_qc_complete() set scmd->result to
+ SAM_STAT_CHECK_CONDITION, complete the scmd and return 1. As
+ the sense data is empty but scmd->result is CHECK CONDITION,
+ SCSI midlayer will invoke EH for the scmd, and returning 1
+ makes ata_qc_complete() to return without deallocating the qc.
+ This leads us to ata_scsi_error() with partially completed qc.
+ </para>
+
+ </sect1>
+
+ <sect1><title>ata_scsi_error()</title>
+ <para>
+ ata_scsi_error() is the current hostt->eh_strategy_handler()
+ for libata. As discussed above, this will be entered in two
+ cases - timeout and ATAPI error completion. This function
+ calls low level libata driver's eng_timeout() callback, the
+ standard callback for which is ata_eng_timeout(). It checks
+ if a qc is active and calls ata_qc_timeout() on the qc if so.
+ Actual error handling occurs in ata_qc_timeout().
+ </para>
+ <para>
+ If EH is invoked for timeout, ata_qc_timeout() stops BMDMA and
+ completes the qc. Note that as we're currently in EH, we
+ cannot call scsi_done. As described in SCSI EH doc, a
+ recovered scmd should be either retried with
+ scsi_queue_insert() or finished with scsi_finish_command().
+ Here, we override qc->scsidone with scsi_finish_command() and
+ calls ata_qc_complete().
+ </para>
+ <para>
+ If EH is invoked due to a failed ATAPI qc, the qc here is
+ completed but not deallocated. The purpose of this
+ half-completion is to use the qc as place holder to make EH
+ code reach this place. This is a bit hackish, but it works.
+ </para>
+ <para>
+ Once control reaches here, the qc is deallocated by invoking
+ __ata_qc_complete() explicitly. Then, internal qc for REQUEST
+ SENSE is issued. Once sense data is acquired, scmd is
+ finished by directly invoking scsi_finish_command() on the
+ scmd. Note that as we already have completed and deallocated
+ the qc which was associated with the scmd, we don't need
+ to/cannot call ata_qc_complete() again.
+ </para>
+
+ </sect1>
+
+ <sect1><title>Problems with the current EH</title>
+
+ <itemizedlist>
+
+ <listitem>
+ <para>
+ Error representation is too crude. Currently any and all
+ error conditions are represented with ATA STATUS and ERROR
+ registers. Errors which aren't ATA device errors are treated
+ as ATA device errors by setting ATA_ERR bit. Better error
+ descriptor which can properly represent ATA and other
+ errors/exceptions is needed.
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ When handling timeouts, no action is taken to make device
+ forget about the timed out command and ready for new commands.
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ EH handling via ata_scsi_error() is not properly protected
+ from usual command processing. On EH entrance, the device is
+ not in quiescent state. Timed out commands may succeed or
+ fail any time. pio_task and atapi_task may still be running.
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ Too weak error recovery. Devices / controllers causing HSM
+ mismatch errors and other errors quite often require reset to
+ return to known state. Also, advanced error handling is
+ necessary to support features like NCQ and hotplug.
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ ATA errors are directly handled in the interrupt handler and
+ PIO errors in pio_task. This is problematic for advanced
+ error handling for the following reasons.
+ </para>
+ <para>
+ First, advanced error handling often requires context and
+ internal qc execution.
+ </para>
+ <para>
+ Second, even a simple failure (say, CRC error) needs
+ information gathering and could trigger complex error handling
+ (say, resetting &amp; reconfiguring). Having multiple code
+ paths to gather information, enter EH and trigger actions
+ makes life painful.
+ </para>
+ <para>
+ Third, scattered EH code makes implementing low level drivers
+ difficult. Low level drivers override libata callbacks. If
+ EH is scattered over several places, each affected callbacks
+ should perform its part of error handling. This can be error
+ prone and painful.
+ </para>
+ </listitem>
+
+ </itemizedlist>
+ </sect1>
+ </chapter>
+
<chapter id="libataExt">
<title>libata Library</title>
!Edrivers/scsi/libata-core.c
diff --git a/Documentation/SubmittingPatches b/Documentation/SubmittingPatches
index 7f43b040311..1d96efec5e8 100644
--- a/Documentation/SubmittingPatches
+++ b/Documentation/SubmittingPatches
@@ -301,8 +301,68 @@ now, but you can do this to mark internal company procedures or just
point out some special detail about the sign-off.
+12) The canonical patch format
-12) More references for submitting patches
+The canonical patch subject line is:
+
+ Subject: [PATCH 001/123] [<area>:] <explanation>
+
+The canonical patch message body contains the following:
+
+ - A "from" line specifying the patch author.
+
+ - An empty line.
+
+ - The body of the explanation, which will be copied to the
+ permanent changelog to describe this patch.
+
+ - The "Signed-off-by:" lines, described above, which will
+ also go in the changelog.
+
+ - A marker line containing simply "---".
+
+ - Any additional comments not suitable for the changelog.
+
+ - The actual patch (diff output).
+
+The Subject line format makes it very easy to sort the emails
+alphabetically by subject line - pretty much any email reader will
+support that - since because the sequence number is zero-padded,
+the numerical and alphabetic sort is the same.
+
+See further details on how to phrase the "<explanation>" in the
+"Subject:" line in Andrew Morton's "The perfect patch", referenced
+below.
+
+The "from" line must be the very first line in the message body,
+and has the form:
+
+ From: Original Author <author@example.com>
+
+The "from" line specifies who will be credited as the author of the
+patch in the permanent changelog. If the "from" line is missing,
+then the "From:" line from the email header will be used to determine
+the patch author in the changelog.
+
+The explanation body will be committed to the permanent source
+changelog, so should make sense to a competent reader who has long
+since forgotten the immediate details of the discussion that might
+have led to this patch.
+
+The "---" marker line serves the essential purpose of marking for patch
+handling tools where the changelog message ends.
+
+One good use for the additional comments after the "---" marker is for
+a diffstat, to show what files have changed, and the number of inserted
+and deleted lines per file. A diffstat is especially useful on bigger
+patches. Other comments relevant only to the moment or the maintainer,
+not suitable for the permanent changelog, should also go here.
+
+See more details on the proper patch format in the following
+references.
+
+
+13) More references for submitting patches
Andrew Morton, "The perfect patch" (tpp).
<http://www.zip.com.au/~akpm/linux/patches/stuff/tpp.txt>
@@ -310,6 +370,14 @@ Andrew Morton, "The perfect patch" (tpp).
Jeff Garzik, "Linux kernel patch submission format."
<http://linux.yyz.us/patch-format.html>
+Greg KH, "How to piss off a kernel subsystem maintainer"
+ <http://www.kroah.com/log/2005/03/31/>
+
+Kernel Documentation/CodingStyle
+ <http://sosdg.org/~coywolf/lxr/source/Documentation/CodingStyle>
+
+Linus Torvald's mail on the canonical patch format:
+ <http://lkml.org/lkml/2005/4/7/183>
-----------------------------------
diff --git a/Documentation/keys.txt b/Documentation/keys.txt
index 0321ded4b9a..b22e7c8d059 100644
--- a/Documentation/keys.txt
+++ b/Documentation/keys.txt
@@ -195,8 +195,8 @@ KEY ACCESS PERMISSIONS
======================
Keys have an owner user ID, a group access ID, and a permissions mask. The mask
-has up to eight bits each for user, group and other access. Only five of each
-set of eight bits are defined. These permissions granted are:
+has up to eight bits each for possessor, user, group and other access. Only
+five of each set of eight bits are defined. These permissions granted are:
(*) View
@@ -241,16 +241,16 @@ about the status of the key service:
type, description and permissions. The payload of the key is not available
this way:
- SERIAL FLAGS USAGE EXPY PERM UID GID TYPE DESCRIPTION: SUMMARY
- 00000001 I----- 39 perm 1f0000 0 0 keyring _uid_ses.0: 1/4
- 00000002 I----- 2 perm 1f0000 0 0 keyring _uid.0: empty
- 00000007 I----- 1 perm 1f0000 0 0 keyring _pid.1: empty
- 0000018d I----- 1 perm 1f0000 0 0 keyring _pid.412: empty
- 000004d2 I--Q-- 1 perm 1f0000 32 -1 keyring _uid.32: 1/4
- 000004d3 I--Q-- 3 perm 1f0000 32 -1 keyring _uid_ses.32: empty
- 00000892 I--QU- 1 perm 1f0000 0 0 user metal:copper: 0
- 00000893 I--Q-N 1 35s 1f0000 0 0 user metal:silver: 0
- 00000894 I--Q-- 1 10h 1f0000 0 0 user metal:gold: 0
+ SERIAL FLAGS USAGE EXPY PERM UID GID TYPE DESCRIPTION: SUMMARY
+ 00000001 I----- 39 perm 1f1f0000 0 0 keyring _uid_ses.0: 1/4
+ 00000002 I----- 2 perm 1f1f0000 0 0 keyring _uid.0: empty
+ 00000007 I----- 1 perm 1f1f0000 0 0 keyring _pid.1: empty
+ 0000018d I----- 1 perm 1f1f0000 0 0 keyring _pid.412: empty
+ 000004d2 I--Q-- 1 perm 1f1f0000 32 -1 keyring _uid.32: 1/4
+ 000004d3 I--Q-- 3 perm 1f1f0000 32 -1 keyring _uid_ses.32: empty
+ 00000892 I--QU- 1 perm 1f000000 0 0 user metal:copper: 0
+ 00000893 I--Q-N 1 35s 1f1f0000 0 0 user metal:silver: 0
+ 00000894 I--Q-- 1 10h 001f0000 0 0 user metal:gold: 0
The flags are:
@@ -637,6 +637,34 @@ call, and the key released upon close. How to deal with conflicting keys due to
two different users opening the same file is left to the filesystem author to
solve.
+Note that there are two different types of pointers to keys that may be
+encountered:
+
+ (*) struct key *
+
+ This simply points to the key structure itself. Key structures will be at
+ least four-byte aligned.
+
+ (*) key_ref_t
+
+ This is equivalent to a struct key *, but the least significant bit is set
+ if the caller "possesses" the key. By "possession" it is meant that the
+ calling processes has a searchable link to the key from one of its
+ keyrings. There are three functions for dealing with these:
+
+ key_ref_t make_key_ref(const struct key *key,
+ unsigned long possession);
+
+ struct key *key_ref_to_ptr(const key_ref_t key_ref);
+
+ unsigned long is_key_possessed(const key_ref_t key_ref);
+
+ The first function constructs a key reference from a key pointer and
+ possession information (which must be 0 or 1 and not any other value).
+
+ The second function retrieves the key pointer from a reference and the
+ third retrieves the possession flag.
+
When accessing a key's payload contents, certain precautions must be taken to
prevent access vs modification races. See the section "Notes on accessing
payload contents" for more information.
@@ -665,7 +693,11 @@ payload contents" for more information.
void key_put(struct key *key);
- This can be called from interrupt context. If CONFIG_KEYS is not set then
+ Or:
+
+ void key_ref_put(key_ref_t key_ref);
+
+ These can be called from interrupt context. If CONFIG_KEYS is not set then
the argument will not be parsed.
@@ -689,13 +721,17 @@ payload contents" for more information.
(*) If a keyring was found in the search, this can be further searched by:
- struct key *keyring_search(struct key *keyring,
- const struct key_type *type,
- const char *description)
+ key_ref_t keyring_search(key_ref_t keyring_ref,
+ const struct key_type *type,
+ const char *description)
This searches the keyring tree specified for a matching key. Error ENOKEY
- is returned upon failure. If successful, the returned key will need to be
- released.
+ is returned upon failure (use IS_ERR/PTR_ERR to determine). If successful,
+ the returned key will need to be released.
+
+ The possession attribute from the keyring reference is used to control
+ access through the permissions mask and is propagated to the returned key
+ reference pointer if successful.
(*) To check the validity of a key, this function can be called:
@@ -732,7 +768,7 @@ More complex payload contents must be allocated and a pointer to them set in
key->payload.data. One of the following ways must be selected to access the
data:
- (1) Unmodifyable key type.
+ (1) Unmodifiable key type.
If the key type does not have a modify method, then the key's payload can
be accessed without any form of locking, provided that it's known to be