aboutsummaryrefslogtreecommitdiff
path: root/Documentation/networking
diff options
context:
space:
mode:
authorIngo Molnar <mingo@elte.hu>2008-07-31 18:43:41 +0200
committerIngo Molnar <mingo@elte.hu>2008-07-31 18:43:41 +0200
commit85e9ca333d03fbd56b9e123c8456f0d98e20faad (patch)
tree7bb15ada5f536950efa23ad60ea9eea60380ca1c /Documentation/networking
parenta300bec952127d9a15e666b391bb35c9aecb3002 (diff)
parent6e86841d05f371b5b9b86ce76c02aaee83352298 (diff)
Merge branch 'linus' into timers/hpet
Diffstat (limited to 'Documentation/networking')
-rw-r--r--Documentation/networking/bonding.txt112
-rw-r--r--Documentation/networking/can.txt4
-rw-r--r--Documentation/networking/dm9000.txt167
-rw-r--r--Documentation/networking/e1000.txt14
-rw-r--r--Documentation/networking/ip-sysctl.txt289
-rw-r--r--Documentation/networking/ixgb.txt419
-rw-r--r--Documentation/networking/mac80211_hwsim/README67
-rw-r--r--Documentation/networking/mac80211_hwsim/hostapd.conf11
-rw-r--r--Documentation/networking/mac80211_hwsim/wpa_supplicant.conf10
-rw-r--r--Documentation/networking/multiqueue.txt90
-rw-r--r--Documentation/networking/packet_mmap.txt2
-rw-r--r--Documentation/networking/s2io.txt13
-rw-r--r--Documentation/networking/tc-actions-env-rules.txt15
-rw-r--r--Documentation/networking/udplite.txt2
14 files changed, 911 insertions, 304 deletions
diff --git a/Documentation/networking/bonding.txt b/Documentation/networking/bonding.txt
index a0cda062bc3..688dfe1e6b7 100644
--- a/Documentation/networking/bonding.txt
+++ b/Documentation/networking/bonding.txt
@@ -289,35 +289,73 @@ downdelay
fail_over_mac
Specifies whether active-backup mode should set all slaves to
- the same MAC address (the traditional behavior), or, when
- enabled, change the bond's MAC address when changing the
- active interface (i.e., fail over the MAC address itself).
-
- Fail over MAC is useful for devices that cannot ever alter
- their MAC address, or for devices that refuse incoming
- broadcasts with their own source MAC (which interferes with
- the ARP monitor).
-
- The down side of fail over MAC is that every device on the
- network must be updated via gratuitous ARP, vs. just updating
- a switch or set of switches (which often takes place for any
- traffic, not just ARP traffic, if the switch snoops incoming
- traffic to update its tables) for the traditional method. If
- the gratuitous ARP is lost, communication may be disrupted.
-
- When fail over MAC is used in conjuction with the mii monitor,
- devices which assert link up prior to being able to actually
- transmit and receive are particularly susecptible to loss of
- the gratuitous ARP, and an appropriate updelay setting may be
- required.
-
- A value of 0 disables fail over MAC, and is the default. A
- value of 1 enables fail over MAC. This option is enabled
- automatically if the first slave added cannot change its MAC
- address. This option may be modified via sysfs only when no
- slaves are present in the bond.
-
- This option was added in bonding version 3.2.0.
+ the same MAC address at enslavement (the traditional
+ behavior), or, when enabled, perform special handling of the
+ bond's MAC address in accordance with the selected policy.
+
+ Possible values are:
+
+ none or 0
+
+ This setting disables fail_over_mac, and causes
+ bonding to set all slaves of an active-backup bond to
+ the same MAC address at enslavement time. This is the
+ default.
+
+ active or 1
+
+ The "active" fail_over_mac policy indicates that the
+ MAC address of the bond should always be the MAC
+ address of the currently active slave. The MAC
+ address of the slaves is not changed; instead, the MAC
+ address of the bond changes during a failover.
+
+ This policy is useful for devices that cannot ever
+ alter their MAC address, or for devices that refuse
+ incoming broadcasts with their own source MAC (which
+ interferes with the ARP monitor).
+
+ The down side of this policy is that every device on
+ the network must be updated via gratuitous ARP,
+ vs. just updating a switch or set of switches (which
+ often takes place for any traffic, not just ARP
+ traffic, if the switch snoops incoming traffic to
+ update its tables) for the traditional method. If the
+ gratuitous ARP is lost, communication may be
+ disrupted.
+
+ When this policy is used in conjuction with the mii
+ monitor, devices which assert link up prior to being
+ able to actually transmit and receive are particularly
+ susecptible to loss of the gratuitous ARP, and an
+ appropriate updelay setting may be required.
+
+ follow or 2
+
+ The "follow" fail_over_mac policy causes the MAC
+ address of the bond to be selected normally (normally
+ the MAC address of the first slave added to the bond).
+ However, the second and subsequent slaves are not set
+ to this MAC address while they are in a backup role; a
+ slave is programmed with the bond's MAC address at
+ failover time (and the formerly active slave receives
+ the newly active slave's MAC address).
+
+ This policy is useful for multiport devices that
+ either become confused or incur a performance penalty
+ when multiple ports are programmed with the same MAC
+ address.
+
+
+ The default policy is none, unless the first slave cannot
+ change its MAC address, in which case the active policy is
+ selected by default.
+
+ This option may be modified via sysfs only when no slaves are
+ present in the bond.
+
+ This option was added in bonding version 3.2.0. The "follow"
+ policy was added in bonding version 3.3.0.
lacp_rate
@@ -338,7 +376,8 @@ max_bonds
Specifies the number of bonding devices to create for this
instance of the bonding driver. E.g., if max_bonds is 3, and
the bonding driver is not already loaded, then bond0, bond1
- and bond2 will be created. The default value is 1.
+ and bond2 will be created. The default value is 1. Specifying
+ a value of 0 will load bonding, but will not create any devices.
miimon
@@ -501,6 +540,17 @@ mode
swapped with the new curr_active_slave that was
chosen.
+num_grat_arp
+
+ Specifies the number of gratuitous ARPs to be issued after a
+ failover event. One gratuitous ARP is issued immediately after
+ the failover, subsequent ARPs are sent at a rate of one per link
+ monitor interval (arp_interval or miimon, whichever is active).
+
+ The valid range is 0 - 255; the default value is 1. This option
+ affects only the active-backup mode. This option was added for
+ bonding version 3.3.0.
+
primary
A string (eth0, eth2, etc) specifying which slave is the
@@ -581,7 +631,7 @@ xmit_hash_policy
in environments where a layer3 gateway device is
required to reach most destinations.
- This algorithm is 802.3ad complient.
+ This algorithm is 802.3ad compliant.
layer3+4
diff --git a/Documentation/networking/can.txt b/Documentation/networking/can.txt
index 641d2afacff..297ba7b1cca 100644
--- a/Documentation/networking/can.txt
+++ b/Documentation/networking/can.txt
@@ -186,7 +186,7 @@ solution for a couple of reasons:
The Linux network devices (by default) just can handle the
transmission and reception of media dependent frames. Due to the
- arbritration on the CAN bus the transmission of a low prio CAN-ID
+ arbitration on the CAN bus the transmission of a low prio CAN-ID
may be delayed by the reception of a high prio CAN frame. To
reflect the correct* traffic on the node the loopback of the sent
data has to be performed right after a successful transmission. If
@@ -481,7 +481,7 @@ solution for a couple of reasons:
- stats_timer: To calculate the Socket CAN core statistics
(e.g. current/maximum frames per second) this 1 second timer is
invoked at can.ko module start time by default. This timer can be
- disabled by using stattimer=0 on the module comandline.
+ disabled by using stattimer=0 on the module commandline.
- debug: (removed since SocketCAN SVN r546)
diff --git a/Documentation/networking/dm9000.txt b/Documentation/networking/dm9000.txt
new file mode 100644
index 00000000000..65df3dea556
--- /dev/null
+++ b/Documentation/networking/dm9000.txt
@@ -0,0 +1,167 @@
+DM9000 Network driver
+=====================
+
+Copyright 2008 Simtec Electronics,
+ Ben Dooks <ben@simtec.co.uk> <ben-linux@fluff.org>
+
+
+Introduction
+------------
+
+This file describes how to use the DM9000 platform-device based network driver
+that is contained in the files drivers/net/dm9000.c and drivers/net/dm9000.h.
+
+The driver supports three DM9000 variants, the DM9000E which is the first chip
+supported as well as the newer DM9000A and DM9000B devices. It is currently
+maintained and tested by Ben Dooks, who should be CC: to any patches for this
+driver.
+
+
+Defining the platform device
+----------------------------
+
+The minimum set of resources attached to the platform device are as follows:
+
+ 1) The physical address of the address register
+ 2) The physical address of the data register
+ 3) The IRQ line the device's interrupt pin is connected to.
+
+These resources should be specified in that order, as the ordering of the
+two address regions is important (the driver expects these to be address
+and then data).
+
+An example from arch/arm/mach-s3c2410/mach-bast.c is:
+
+static struct resource bast_dm9k_resource[] = {
+ [0] = {
+ .start = S3C2410_CS5 + BAST_PA_DM9000,
+ .end = S3C2410_CS5 + BAST_PA_DM9000 + 3,
+ .flags = IORESOURCE_MEM,
+ },
+ [1] = {
+ .start = S3C2410_CS5 + BAST_PA_DM9000 + 0x40,
+ .end = S3C2410_CS5 + BAST_PA_DM9000 + 0x40 + 0x3f,
+ .flags = IORESOURCE_MEM,
+ },
+ [2] = {
+ .start = IRQ_DM9000,
+ .end = IRQ_DM9000,
+ .flags = IORESOURCE_IRQ | IORESOURCE_IRQ_HIGHLEVEL,
+ }
+};
+
+static struct platform_device bast_device_dm9k = {
+ .name = "dm9000",
+ .id = 0,
+ .num_resources = ARRAY_SIZE(bast_dm9k_resource),
+ .resource = bast_dm9k_resource,
+};
+
+Note the setting of the IRQ trigger flag in bast_dm9k_resource[2].flags,
+as this will generate a warning if it is not present. The trigger from
+the flags field will be passed to request_irq() when registering the IRQ
+handler to ensure that the IRQ is setup correctly.
+
+This shows a typical platform device, without the optional configuration
+platform data supplied. The next example uses the same resources, but adds
+the optional platform data to pass extra configuration data:
+
+static struct dm9000_plat_data bast_dm9k_platdata = {
+ .flags = DM9000_PLATF_16BITONLY,
+};
+
+static struct platform_device bast_device_dm9k = {
+ .name = "dm9000",
+ .id = 0,
+ .num_resources = ARRAY_SIZE(bast_dm9k_resource),
+ .resource = bast_dm9k_resource,
+ .dev = {
+ .platform_data = &bast_dm9k_platdata,
+ }
+};
+
+The platform data is defined in include/linux/dm9000.h and described below.
+
+
+Platform data
+-------------
+
+Extra platform data for the DM9000 can describe the IO bus width to the
+device, whether or not an external PHY is attached to the device and
+the availability of an external configuration EEPROM.
+
+The flags for the platform data .flags field are as follows:
+
+DM9000_PLATF_8BITONLY
+
+ The IO should be done with 8bit operations.
+
+DM9000_PLATF_16BITONLY
+
+ The IO should be done with 16bit operations.
+
+DM9000_PLATF_32BITONLY
+
+ The IO should be done with 32bit operations.
+
+DM9000_PLATF_EXT_PHY
+
+ The chip is connected to an external PHY.
+
+DM9000_PLATF_NO_EEPROM
+
+ This can be used to signify that the board does not have an
+ EEPROM, or that the EEPROM should be hidden from the user.
+
+DM9000_PLATF_SIMPLE_PHY
+
+ Switch to using the simpler PHY polling method which does not
+ try and read the MII PHY state regularly. This is only available
+ when using the internal PHY. See the section on link state polling
+ for more information.
+
+ The config symbol DM9000_FORCE_SIMPLE_PHY_POLL, Kconfig entry
+ "Force simple NSR based PHY polling" allows this flag to be
+ forced on at build time.
+
+
+PHY Link state polling
+----------------------
+
+The driver keeps track of the link state and informs the network core
+about link (carrier) availablilty. This is managed by several methods
+depending on the version of the chip and on which PHY is being used.
+
+For the internal PHY, the original (and currently default) method is
+to read the MII state, either when the status changes if we have the
+necessary interrupt support in the chip or every two seconds via a
+periodic timer.
+
+To reduce the overhead for the internal PHY, there is now the option
+of using the DM9000_FORCE_SIMPLE_PHY_POLL config, or DM9000_PLATF_SIMPLE_PHY
+platform data option to read the summary information without the
+expensive MII accesses. This method is faster, but does not print
+as much information.
+
+When using an external PHY, the driver currently has to poll the MII
+link status as there is no method for getting an interrupt on link change.
+
+
+DM9000A / DM9000B
+-----------------
+
+These chips are functionally similar to the DM9000E and are supported easily
+by the same driver. The features are:
+
+ 1) Interrupt on internal PHY state change. This means that the periodic
+ polling of the PHY status may be disabled on these devices when using
+ the internal PHY.
+
+ 2) TCP/UDP checksum offloading, which the driver does not currently support.
+
+
+ethtool
+-------
+
+The driver supports the ethtool interface for access to the driver
+state information, the PHY state and the EEPROM.
diff --git a/Documentation/networking/e1000.txt b/Documentation/networking/e1000.txt
index 61b171cf531..2df71861e57 100644
--- a/Documentation/networking/e1000.txt
+++ b/Documentation/networking/e1000.txt
@@ -513,21 +513,11 @@ Additional Configurations
Intel(R) PRO/1000 PT Dual Port Server Connection
Intel(R) PRO/1000 PT Dual Port Server Adapter
Intel(R) PRO/1000 PF Dual Port Server Adapter
- Intel(R) PRO/1000 PT Quad Port Server Adapter
+ Intel(R) PRO/1000 PT Quad Port Server Adapter
NAPI
----
- NAPI (Rx polling mode) is supported in the e1000 driver. NAPI is enabled
- or disabled based on the configuration of the kernel. To override
- the default, use the following compile-time flags.
-
- To enable NAPI, compile the driver module, passing in a configuration option:
-
- make CFLAGS_EXTRA=-DE1000_NAPI install
-
- To disable NAPI, compile the driver module, passing in a configuration option:
-
- make CFLAGS_EXTRA=-DE1000_NO_NAPI install
+ NAPI (Rx polling mode) is enabled in the e1000 driver.
See www.cyberus.ca/~hadi/usenix-paper.tgz for more information on NAPI.
diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
index 17a6e46fbd4..d84932650fd 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -81,23 +81,23 @@ inet_peer_minttl - INTEGER
Minimum time-to-live of entries. Should be enough to cover fragment
time-to-live on the reassembling side. This minimum time-to-live is
guaranteed if the pool size is less than inet_peer_threshold.
- Measured in jiffies(1).
+ Measured in seconds.
inet_peer_maxttl - INTEGER
Maximum time-to-live of entries. Unused entries will expire after
this period of time if there is no memory pressure on the pool (i.e.
when the number of entries in the pool is very small).
- Measured in jiffies(1).
+ Measured in seconds.
inet_peer_gc_mintime - INTEGER
Minimum interval between garbage collection passes. This interval is
in effect under high memory pressure on the pool.
- Measured in jiffies(1).
+ Measured in seconds.
inet_peer_gc_maxtime - INTEGER
Minimum interval between garbage collection passes. This interval is
in effect under low (or absent) memory pressure on the pool.
- Measured in jiffies(1).
+ Measured in seconds.
TCP variables:
@@ -148,9 +148,9 @@ tcp_available_congestion_control - STRING
but not loaded.
tcp_base_mss - INTEGER
- The initial value of search_low to be used by Packetization Layer
- Path MTU Discovery (MTU probing). If MTU probing is enabled,
- this is the inital MSS used by the connection.
+ The initial value of search_low to be used by the packetization layer
+ Path MTU discovery (MTU probing). If MTU probing is enabled,
+ this is the initial MSS used by the connection.
tcp_congestion_control - STRING
Set the congestion control algorithm to be used for new
@@ -185,10 +185,9 @@ tcp_frto - INTEGER
timeouts. It is particularly beneficial in wireless environments
where packet loss is typically due to random radio interference
rather than intermediate router congestion. F-RTO is sender-side
- only modification. Therefore it does not require any support from
- the peer, but in a typical case, however, where wireless link is
- the local access link and most of the data flows downlink, the
- faraway servers should have F-RTO enabled to take advantage of it.
+ only modification. Therefore it does not require any support from
+ the peer.
+
If set to 1, basic version is enabled. 2 enables SACK enhanced
F-RTO if flow uses SACK. The basic version can be used also when
SACK is in use though scenario(s) with it exists where F-RTO
@@ -276,7 +275,7 @@ tcp_mem - vector of 3 INTEGERs: min, pressure, max
memory.
tcp_moderate_rcvbuf - BOOLEAN
- If set, TCP performs receive buffer autotuning, attempting to
+ If set, TCP performs receive buffer auto-tuning, attempting to
automatically size the buffer (no greater than tcp_rmem[2]) to
match the size required by the path for full throughput. Enabled by
default.
@@ -336,7 +335,7 @@ tcp_rmem - vector of 3 INTEGERs: min, default, max
pressure.
Default: 8K
- default: default size of receive buffer used by TCP sockets.
+ default: initial size of receive buffer used by TCP sockets.
This value overrides net.core.rmem_default used by other protocols.
Default: 87380 bytes. This value results in window of 65535 with
default setting of tcp_adv_win_scale and tcp_app_win:0 and a bit
@@ -344,8 +343,10 @@ tcp_rmem - vector of 3 INTEGERs: min, default, max
max: maximal size of receive buffer allowed for automatically
selected receiver buffers for TCP socket. This value does not override
- net.core.rmem_max, "static" selection via SO_RCVBUF does not use this.
- Default: 87380*2 bytes.
+ net.core.rmem_max. Calling setsockopt() with SO_RCVBUF disables
+ automatic tuning of that socket's receive buffer size, in which
+ case this value is ignored.
+ Default: between 87380B and 4MB, depending on RAM size.
tcp_sack - BOOLEAN
Enable select acknowledgments (SACKS).
@@ -358,7 +359,7 @@ tcp_slow_start_after_idle - BOOLEAN
Default: 1
tcp_stdurg - BOOLEAN
- Use the Host requirements interpretation of the TCP urg pointer field.
+ Use the Host requirements interpretation of the TCP urgent pointer field.
Most hosts use the older BSD interpretation, so if you turn this on
Linux might not communicate correctly with them.
Default: FALSE
@@ -371,12 +372,12 @@ tcp_synack_retries - INTEGER
tcp_syncookies - BOOLEAN
Only valid when the kernel was compiled with CONFIG_SYNCOOKIES
Send out syncookies when the syn backlog queue of a socket
- overflows. This is to prevent against the common 'syn flood attack'
+ overflows. This is to prevent against the common 'SYN flood attack'
Default: FALSE
Note, that syncookies is fallback facility.
It MUST NOT be used to help highly loaded servers to stand
- against legal connection rate. If you see synflood warnings
+ against legal connection rate. If you see SYN flood warnings
in your logs, but investigation shows that they occur
because of overload with legal connections, you should tune
another parameters until this warning disappear.
@@ -386,7 +387,7 @@ tcp_syncookies - BOOLEAN
to use TCP extensions, can result in serious degradation
of some services (f.e. SMTP relaying), visible not by you,
but your clients and relays, contacting you. While you see
- synflood warnings in logs not being really flooded, your server
+ SYN flood warnings in logs not being really flooded, your server
is seriously misconfigured.
tcp_syn_retries - INTEGER
@@ -419,19 +420,21 @@ tcp_window_scaling - BOOLEAN
Enable window scaling as defined in RFC1323.
tcp_wmem - vector of 3 INTEGERs: min, default, max
- min: Amount of memory reserved for send buffers for TCP socket.
+ min: Amount of memory reserved for send buffers for TCP sockets.
Each TCP socket has rights to use it due to fact of its birth.
Default: 4K
- default: Amount of memory allowed for send buffers for TCP socket
- by default. This value overrides net.core.wmem_default used
- by other protocols, it is usually lower than net.core.wmem_default.
+ default: initial size of send buffer used by TCP sockets. This
+ value overrides net.core.wmem_default used by other protocols.
+ It is usually lower than net.core.wmem_default.
Default: 16K
- max: Maximal amount of memory allowed for automatically selected
- send buffers for TCP socket. This value does not override
- net.core.wmem_max, "static" selection via SO_SNDBUF does not use this.
- Default: 128K
+ max: Maximal amount of memory allowed for automatically tuned
+ send buffers for TCP sockets. This value does not override
+ net.core.wmem_max. Calling setsockopt() with SO_SNDBUF disables
+ automatic tuning of that socket's send buffer size, in which case
+ this value is ignored.
+ Default: between 64K and 4MB, depending on RAM size.
tcp_workaround_signed_windows - BOOLEAN
If set, assume no receipt of a window scaling option means the
@@ -548,8 +551,9 @@ icmp_echo_ignore_broadcasts - BOOLEAN
icmp_ratelimit - INTEGER
Limit the maximal rates for sending ICMP packets whose type matches
icmp_ratemask (see below) to specific targets.
- 0 to disable any limiting, otherwise the maximal rate in jiffies(1)
- Default: 100
+ 0 to disable any limiting,
+ otherwise the minimal space between responses in milliseconds.
+ Default: 1000
icmp_ratemask - INTEGER
Mask made of ICMP types for which rates are being limited.
@@ -794,10 +798,6 @@ tag - INTEGER
Allows you to write a number, which can be used as required.
Default value is 0.
-(1) Jiffie: internal timeunit for the kernel. On the i386 1/100s, on the
-Alpha 1/1024s. See the HZ define in /usr/include/asm/param.h for the exact
-value on your system.
-
Alexey Kuznetsov.
kuznet@ms2.inr.ac.ru
@@ -1024,11 +1024,23 @@ max_addresses - INTEGER
autoconfigured addresses.
Default: 16
+disable_ipv6 - BOOLEAN
+ Disable IPv6 operation.
+ Default: FALSE (enable IPv6 operation)
+
+accept_dad - INTEGER
+ Whether to accept DAD (Duplicate Address Detection).
+ 0: Disable DAD
+ 1: Enable DAD (default)
+ 2: Enable DAD, and disable IPv6 operation if MAC-based duplicate
+ link-local address has been found.
+
icmp/*:
ratelimit - INTEGER
Limit the maximal rates for sending ICMPv6 packets.
- 0 to disable any limiting, otherwise the maximal rate in jiffies(1)
- Default: 100
+ 0 to disable any limiting,
+ otherwise the minimal space between responses in milliseconds.
+ Default: 1000
IPv6 Update by:
@@ -1064,24 +1076,193 @@ bridge-nf-filter-pppoe-tagged - BOOLEAN
Default: 1
-UNDOCUMENTED:
+proc/sys/net/sctp/* Variables:
+
+addip_enable - BOOLEAN
+ Enable or disable extension of Dynamic Address Reconfiguration
+ (ADD-IP) functionality specified in RFC5061. This extension provides
+ the ability to dynamically add and remove new addresses for the SCTP
+ associations.
+
+ 1: Enable extension.
+
+ 0: Disable extension.
+
+ Default: 0
+
+addip_noauth_enable - BOOLEAN
+ Dynamic Address Reconfiguration (ADD-IP) requires the use of
+ authentication to protect the operations of adding or removing new
+ addresses. This requirement is mandated so that unauthorized hosts
+ would not be able to hijack associations. However, older
+ implementations may not have implemented this requirement while
+ allowing the ADD-IP extension. For reasons of interoperability,
+ we provide this variable to control the enforcement of the
+ authentication requirement.
+
+ 1: Allow ADD-IP extension to be used without authentication. This
+ should only be set in a closed environment for interoperability
+ with older implementations.
+
+ 0: Enforce the authentication requirement
+
+ Default: 0
+
+auth_enable - BOOLEAN
+ Enable or disable Authenticated Chunks extension. This extension
+ provides the ability to send and receive authenticated chunks and is
+ required for secure operation of Dynamic Address Reconfiguration
+ (ADD-IP) extension.
+
+ 1: Enable this extension.
+ 0: Disable this extension.
+
+ Default: 0
+
+prsctp_enable - BOOLEAN
+ Enable or disable the Partial Reliability extension (RFC3758) which
+ is used to notify peers that a given DATA should no longer be expected.
+
+ 1: Enable extension
+ 0: Disable
+
+ Default: 1
+
+max_burst - INTEGER
+ The limit of the number of new packets that can be initially sent. It
+ controls how bursty the generated traffic can be.
+
+ Default: 4
+
+association_max_retrans - INTEGER
+ Set the maximum number for retransmissions that an association can
+ attempt deciding that the remote end is unreachable. If this value
+ is exceeded, the association is terminated.
+
+ Default: 10
+
+max_init_retransmits - INTEGER
+ The maximum number of retransmissions of INIT and COOKIE-ECHO chunks
+ that an association will attempt before declaring the destination
+ unreachable and terminating.
+
+ Default: 8
+
+path_max_retrans - INTEGER
+ The maximum number of retransmissions that will be attempted on a given
+ path. Once this threshold is exceeded, the path is considered
+ unreachable, and new traffic will use a different path when the
+ association is multihomed.
+
+ Default: 5
-dev_weight FIXME
-discovery_slots FIXME
-discovery_timeout FIXME
-fast_poll_increase FIXME
-ip6_queue_maxlen FIXME
-lap_keepalive_time FIXME
-lo_cong FIXME
-max_baud_rate FIXME
-max_dgram_qlen FIXME
-max_noreply_time FIXME
-max_tx_data_size FIXME
-max_tx_window FIXME
-min_tx_turn_time FIXME
-mod_cong FIXME
-no_cong FIXME
-no_cong_thresh FIXME
-slot_timeout FIXME
-warn_noreply_time FIXME
+rto_initial - INTEGER
+ The initial round trip timeout value in milliseconds that will be used
+ in calculating round trip times. This is the initial time interval
+ for retransmissions.
+
+ Default: 3000
+
+rto_max - INTEGER
+ The maximum value (in milliseconds) of the round trip timeout. This
+ is the largest time interval that can elapse between retransmissions.
+
+ Default: 60000
+
+rto_min - INTEGER
+ The minimum value (in milliseconds) of the round trip timeout. This
+ is the smallest time interval the can elapse between retransmissions.
+
+ Default: 1000
+
+hb_interval - INTEGER
+ The interval (in milliseconds) between HEARTBEAT chunks. These chunks
+ are sent at the specified interval on idle paths to probe the state of
+ a given path between 2 associations.
+
+ Default: 30000
+
+sack_timeout - INTEGER
+ The amount of time (in milliseconds) that the implementation will wait
+ to send a SACK.
+
+ Default: 200
+
+valid_cookie_life - INTEGER
+ The default lifetime of the SCTP cookie (in milliseconds). The cookie
+ is used during association establishment.
+
+ Default: 60000
+
+cookie_preserve_enable - BOOLEAN
+ Enable or disable the ability to extend the lifetime of the SCTP cookie
+ that is used during the establishment phase of SCTP association
+
+ 1: Enable cookie lifetime extension.
+ 0: Disable
+
+ Default: 1
+
+rcvbuf_policy - INTEGER
+ Determines if the receive buffer is attributed to the socket or to
+ association. SCTP supports the capability to create multiple
+ associations on a single socket. When using this capability, it is
+ possible that a single stalled association that's buffering a lot
+ of data may block other associations from delivering their data by
+ consuming all of the receive buffer space. To work around this,
+ the rcvbuf_policy could be set to attribute the receiver buffer space
+ to each association instead of the socket. This prevents the described
+ blocking.
+
+ 1: rcvbuf space is per association
+ 0: recbuf space is per socket
+
+ Default: 0
+
+sndbuf_policy - INTEGER
+ Similar to rcvbuf_policy above, this applies to send buffer space.
+
+ 1: Send buffer is tracked per association
+ 0: Send buffer is tracked per socket.
+
+ Default: 0
+
+sctp_mem - vector of 3 INTEGERs: min, pressure, max
+ Number of pages allowed for queueing by all SCTP sockets.
+
+ min: Below this number of pages SCTP is not bothered about its
+ memory appetite. When amount of memory allocated by SCTP exceeds
+ this number, SCTP starts to moderate memory usage.
+
+ pressure: This value was introduced to follow format of tcp_mem.
+
+ max: Number of pages allowed for queueing by all SCTP sockets.
+
+ Default is calculated at boot time from amount of available memory.
+
+sctp_rmem - vector of 3 INTEGERs: min, default, max
+ See tcp_rmem for a description.
+
+sctp_wmem - vector of 3 INTEGERs: min, default, max
+ See tcp_wmem for a description.
+
+UNDOCUMENTED:
+/proc/sys/net/core/*
+ dev_weight FIXME
+
+/proc/sys/net/unix/*
+ max_dgram_qlen FIXME
+
+/proc/sys/net/irda/*
+ fast_poll_increase FIXME
+ warn_noreply_time FIXME
+ discovery_slots FIXME
+ slot_timeout FIXME
+ max_baud_rate FIXME
+ discovery_timeout FIXME
+ lap_keepalive_time FIXME
+ max_noreply_time FIXME
+ max_tx_data_size FIXME
+ max_tx_window FIXME
+ min_tx_turn_time FIXME
diff --git a/Documentation/networking/ixgb.txt b/Documentation/networking/ixgb.txt
index 7c98277777e..a0d0ffb5e58 100644
--- a/Documentation/networking/ixgb.txt
+++ b/Documentation/networking/ixgb.txt
@@ -1,7 +1,7 @@
-Linux* Base Driver for the Intel(R) PRO/10GbE Family of Adapters
-================================================================
+Linux Base Driver for 10 Gigabit Intel(R) Network Connection
+=============================================================
-November 17, 2004
+October 9, 2007
Contents
@@ -9,94 +9,151 @@ Contents
- In This Release
- Identifying Your Adapter
+- Building and Installation
- Command Line Parameters
- Improving Performance
+- Additional Configurations
+- Known Issues/Troubleshooting
- Support
+
In This Release
===============
-This file describes the Linux* Base Driver for the Intel(R) PRO/10GbE Family
-of Adapters, version 1.0.x.
+This file describes the ixgb Linux Base Driver for the 10 Gigabit Intel(R)
+Network Connection. This driver includes support for Itanium(R)2-based
+systems.
+
+For questions related to hardware requirements, refer to the documentation
+supplied with your 10 Gigabit adapter. All hardware requirements listed apply
+to use with Linux.
+
+The following features are available in this kernel:
+ - Native VLANs
+ - Channel Bonding (teaming)
+ - SNMP
+
+Channel Bonding documentation can be found in the Linux kernel source:
+/Documentation/networking/bonding.txt
+
+The driver information previously displayed in the /proc filesystem is not
+supported in this release. Alternatively, you can use ethtool (version 1.6
+or later), lspci, and ifconfig to obtain the same information.
+
+Instructions on updating ethtool can be found in the section "Additional
+Configurations" later in this document.
-For questions related to hardware requirements, refer to the documentation
-supplied with your Intel PRO/10GbE adapter. All hardware requirements listed
-apply to use with Linux.
Identifying Your Adapter
========================
-To verify your Intel adapter is supported, find the board ID number on the
-adapter. Look for a label that has a barcode and a number in the format
-A12345-001.
+The following Intel network adapters are compatible with the drivers in this
+release:
+
+Controller Adapter Name Physical Layer
+---------- ------------ --------------
+82597EX Intel(R) PRO/10GbE LR/SR/CX4 10G Base-LR (1310 nm optical fiber)
+ Server Adapters 10G Base-SR (850 nm optical fiber)
+ 10G Base-CX4(twin-axial copper cabling)
+
+For more information on how to identify your adapter, go to the Adapter &
+Driver ID Guide at:
+
+ http://support.intel.com/support/network/sb/CS-012904.htm
+
+
+Building and Installation
+=========================
+
+select m for "Intel(R) PRO/10GbE support" located at:
+ Location:
+ -> Device Drivers
+ -> Network device support (NETDEVICES [=y])
+ -> Ethernet (10000 Mbit) (NETDEV_10000 [=y])
+1. make modules && make modules_install
+
+2. Load the module:
+
+    modprobe ixgb <parameter>=<value>
+
+ The insmod command can be used if the full
+ path to the driver module is specified. For example:
+
+ insmod /lib/modules/<KERNEL VERSION>/kernel/drivers/net/ixgb/ixgb.ko
+
+ With 2.6 based kernels also make sure that older ixgb drivers are
+ removed from the kernel, before loading the new module:
-Use the above information and the Adapter & Driver ID Guide at:
+ rmmod ixgb; modprobe ixgb
- http://support.intel.com/support/network/adapter/pro100/21397.htm
+3. Assign an IP address to the interface by entering the following, where
+ x is the interface number:
-For the latest Intel network drivers for Linux, go to:
+ ifconfig ethx <IP_address>
+
+4. Verify that the interface works. Enter the following, where <IP_address>
+ is the IP address for another machine on the same subnet as the interface
+ that is being tested:
+
+ ping <IP_address>
- http://downloadfinder.intel.com/scripts-df/support_intel.asp
Command Line Parameters
=======================
-If the driver is built as a module, the following optional parameters are
-used by entering them on the command line with the modprobe or insmod command
-using this syntax:
+If the driver is built as a module, the following optional parameters are
+used by entering them on the command line with the modprobe command using
+this syntax:
modprobe ixgb [<option>=<VAL1>,<VAL2>,...]
- insmod ixgb [<option>=<VAL1>,<VAL2>,...]
+For example, with two 10GbE PCI adapters, entering:
-For example, with two PRO/10GbE PCI adapters, entering:
+ modprobe ixgb TxDescriptors=80,128
- insmod ixgb TxDescriptors=80,128
-
-loads the ixgb driver with 80 TX resources for the first adapter and 128 TX
+loads the ixgb driver with 80 TX resources for the first adapter and 128 TX
resources for the second adapter.
The default value for each parameter is generally the recommended setting,
-unless otherwise noted. Also, if the driver is statically built into the
-kernel, the driver is loaded with the default values for all the parameters.
-Ethtool can be used to change some of the parameters at runtime.
+unless otherwise noted.
FlowControl
Valid Range: 0-3 (0=none, 1=Rx only, 2=Tx only, 3=Rx&Tx)
Default: Read from the EEPROM
- If EEPROM is not detected, default is 3
- This parameter controls the automatic generation(Tx) and response(Rx) to
- Ethernet PAUSE frames.
+ If EEPROM is not detected, default is 1
+ This parameter controls the automatic generation(Tx) and response(Rx) to
+ Ethernet PAUSE frames. There are hardware bugs associated with enabling
+ Tx flow control so beware.
RxDescriptors
Valid Range: 64-512
Default Value: 512
- This value is the number of receive descriptors allocated by the driver.
- Increasing this value allows the driver to buffer more incoming packets.
- Each descriptor is 16 bytes. A receive buffer is also allocated for
- each descriptor and can be either 2048, 4056, 8192, or 16384 bytes,
- depending on the MTU setting. When the MTU size is 1500 or less, the
+ This value is the number of receive descriptors allocated by the driver.
+ Increasing this value allows the driver to buffer more incoming packets.
+ Each descriptor is 16 bytes. A receive buffer is also allocated for
+ each descriptor and can be either 2048, 4056, 8192, or 16384 bytes,
+ depending on the MTU setting. When the MTU size is 1500 or less, the
receive buffer size is 2048 bytes. When the MTU is greater than 1500 the
- receive buffer size will be either 4056, 8192, or 16384 bytes. The
+ receive buffer size will be either 4056, 8192, or 16384 bytes. The
maximum MTU size is 16114.
RxIntDelay
Valid Range: 0-65535 (0=off)
-Default Value: 6
- This value delays the generation of receive interrupts in units of
- 0.8192 microseconds. Receive interrupt reduction can improve CPU
- efficiency if properly tuned for specific network traffic. Increasing
- this value adds extra latency to frame reception and can end up
- decreasing the throughput of TCP traffic. If the system is reporting
- dropped receives, this value may be set too high, causing the driver to
+Default Value: 72
+ This value delays the generation of receive interrupts in units of
+ 0.8192 microseconds. Receive interrupt reduction can improve CPU
+ efficiency if properly tuned for specific network traffic. Increasing
+ this value adds extra latency to frame reception and can end up
+ decreasing the throughput of TCP traffic. If the system is reporting
+ dropped receives, this value may be set too high, causing the driver to
run out of available receive descriptors.
TxDescriptors
Valid Range: 64-4096
Default Value: 256
This value is the number of transmit descriptors allocated by the driver.
- Increasing this value allows the driver to queue more transmits. Each
+ Increasing this value allows the driver to queue more transmits. Each
descriptor is 16 bytes.
XsumRX
@@ -105,51 +162,49 @@ Default Value: 1
A value of '1' indicates that the driver should enable IP checksum
offload for received packets (both UDP and TCP) to the adapter hardware.
-XsumTX
-Valid Range: 0-1
-Default Value: 1
- A value of '1' indicates that the driver should enable IP checksum
- offload for transmitted packets (both UDP and TCP) to the adapter
- hardware.
Improving Performance
=====================
-With the Intel PRO/10 GbE adapter, the default Linux configuration will very
-likely limit the total available throughput artificially. There is a set of
-things that when applied together increase the ability of Linux to transmit
-and receive data. The following enhancements were originally acquired from
-settings published at http://www.spec.org/web99 for various submitted results
-using Linux.
+With the 10 Gigabit server adapters, the default Linux configuration will
+very likely limit the total available throughput artificially. There is a set
+of configuration changes that, when applied together, will increase the ability
+of Linux to transmit and receive data. The following enhancements were
+originally acquired from settings published at http://www.spec.org/web99/ for
+various submitted results using Linux.
-NOTE: These changes are only suggestions, and serve as a starting point for
-tuning your network performance.
+NOTE: These changes are only suggestions, and serve as a starting point for
+ tuning your network performance.
The changes are made in three major ways, listed in order of greatest effect:
-- Use ifconfig to modify the mtu (maximum transmission unit) and the txqueuelen
+- Use ifconfig to modify the mtu (maximum transmission unit) and the txqueuelen
parameter.
- Use sysctl to modify /proc parameters (essentially kernel tuning)
-- Use setpci to modify the MMRBC field in PCI-X configuration space to increase
+- Use setpci to modify the MMRBC field in PCI-X configuration space to increase
transmit burst lengths on the bus.
-NOTE: setpci modifies the adapter's configuration registers to allow it to read
-up to 4k bytes at a time (for transmits). However, for some systems the
-behavior after modifying this register may be undefined (possibly errors of some
-kind). A power-cycle, hard reset or explicitly setting the e6 register back to
-22 (setpci -d 8086:1048 e6.b=22) may be required to get back to a stable
-configuration.
+NOTE: setpci modifies the adapter's configuration registers to allow it to read
+up to 4k bytes at a time (for transmits). However, for some systems the
+behavior after modifying this register may be undefined (possibly errors of
+some kind). A power-cycle, hard reset or explicitly setting the e6 register
+back to 22 (setpci -d 8086:1a48 e6.b=22) may be required to get back to a
+stable configuration.
- COPY these lines and paste them into ixgb_perf.sh:
#!/bin/bash
-echo "configuring network performance , edit this file to change the interface"
+echo "configuring network performance , edit this file to change the interface
+or device ID of 10GbE card"
# set mmrbc to 4k reads, modify only Intel 10GbE device IDs
-setpci -d 8086:1048 e6.b=2e
-# set the MTU (max transmission unit) - it requires your switch and clients to change too!
+# replace 1a48 with appropriate 10GbE device's ID installed on the system,
+# if needed.
+setpci -d 8086:1a48 e6.b=2e
+# set the MTU (max transmission unit) - it requires your switch and clients
+# to change as well.
# set the txqueuelen
# your ixgb adapter should be loaded as eth1 for this to work, change if needed
ifconfig eth1 mtu 9000 txqueuelen 1000 up
-# call the sysctl utility to modify /proc/sys entries
-sysctl -p ./sysctl_ixgb.conf
+# call the sysctl utility to modify /proc/sys entries
+sysctl -p ./sysctl_ixgb.conf
- END ixgb_perf.sh
- COPY these lines and paste them into sysctl_ixgb.conf:
@@ -159,54 +214,220 @@ sysctl -p ./sysctl_ixgb.conf
# several network benchmark tests, your mileage may vary
### IPV4 specific settings
-net.ipv4.tcp_timestamps = 0 # turns TCP timestamp support off, default 1, reduces CPU use
-net.ipv4.tcp_sack = 0 # turn SACK support off, default on
-# on systems with a VERY fast bus -> memory interface this is the big gainer
-net.ipv4.tcp_rmem = 10000000 10000000 10000000 # sets min/default/max TCP read buffer, default 4096 87380 174760
-net.ipv4.tcp_wmem = 10000000 10000000 10000000 # sets min/pressure/max TCP write buffer, default 4096 16384 131072
-net.ipv4.tcp_mem = 10000000 10000000 10000000 # sets min/pressure/max TCP buffer space, default 31744 32256 32768
+# turn TCP timestamp support off, default 1, reduces CPU use
+net.ipv4.tcp_timestamps = 0
+# turn SACK support off, default on
+# on systems with a VERY fast bus -> memory interface this is the big gainer
+net.ipv4.tcp_sack = 0
+# set min/default/max TCP read buffer, default 4096 87380 174760
+net.ipv4.tcp_rmem = 10000000 10000000 10000000
+# set min/pressure/max TCP write buffer, default 4096 16384 131072
+net.ipv4.tcp_wmem = 10000000 10000000 10000000
+# set min/pressure/max TCP buffer space, default 31744 32256 32768
+net.ipv4.tcp_mem = 10000000 10000000 10000000
### CORE settings (mostly for socket and UDP effect)
-net.core.rmem_max = 524287 # maximum receive socket buffer size, default 131071
-net.core.wmem_max = 524287 # maximum send socket buffer size, default 131071
-net.core.rmem_default = 524287 # default receive socket buffer size, default 65535
-net.core.wmem_default = 524287 # default send socket buffer size, default 65535
-net.core.optmem_max = 524287 # maximum amount of option memory buffers, default 10240
-net.core.netdev_max_backlog = 300000 # number of unprocessed input packets before kernel starts dropping them, default 300
+# set maximum receive socket buffer size, default 131071
+net.core.rmem_max = 524287
+# set maximum send socket buffer size, default 131071
+net.core.wmem_max = 524287
+# set default receive socket buffer size, default 65535
+net.core.rmem_default = 524287
+# set default send socket buffer size, default 65535
+net.core.wmem_default = 524287
+# set maximum amount of option memory buffers, default 10240
+net.core.optmem_max = 524287
+# set number of unprocessed input packets before kernel starts dropping them; default 300
+net.core.netdev_max_backlog = 300000
- END sysctl_ixgb.conf
-Edit the ixgb_perf.sh script if necessary to change eth1 to whatever interface
-your ixgb driver is using.
+Edit the ixgb_perf.sh script if necessary to change eth1 to whatever interface
+your ixgb driver is using and/or replace '1a48' with appropriate 10GbE device's
+ID installed on the system.
-NOTE: Unless these scripts are added to the boot process, these changes will
-only last only until the next system reboot.
+NOTE: Unless these scripts are added to the boot process, these changes will
+ only last only until the next system reboot.
Resolving Slow UDP Traffic
--------------------------
+If your server does not seem to be able to receive UDP traffic as fast as it
+can receive TCP traffic, it could be because Linux, by default, does not set
+the network stack buffers as large as they need to be to support high UDP
+transfer rates. One way to alleviate this problem is to allow more memory to
+be used by the IP stack to store incoming data.
-If your server does not seem to be able to receive UDP traffic as fast as it
-can receive TCP traffic, it could be because Linux, by default, does not set
-the network stack buffers as large as they need to be to support high UDP
-transfer rates. One way to alleviate this problem is to allow more memory to
-be used by the IP stack to store incoming data.
-
-For instance, use the commands:
+For instance, use the commands:
sysctl -w net.core.rmem_max=262143
and
sysctl -w net.core.rmem_default=262143
-to increase the read buffer memory max and default to 262143 (256k - 1) from
-defaults of max=131071 (128k - 1) and default=65535 (64k - 1). These variables
-will increase the amount of memory used by the network stack for receives, and
+to increase the read buffer memory max and default to 262143 (256k - 1) from
+defaults of max=131071 (128k - 1) and default=65535 (64k - 1). These variables
+will increase the amount of memory used by the network stack for receives, and
can be increased significantly more if necessary for your application.
+
+Additional Configurations
+=========================
+
+ Configuring the Driver on Different Distributions
+ -------------------------------------------------
+ Configuring a network driver to load properly when the system is started is
+ distribution dependent. Typically, the configuration process involves adding
+ an alias line to /etc/modprobe.conf as well as editing other system startup
+ scripts and/or configuration files. Many popular Linux distributions ship
+ with tools to make these changes for you. To learn the proper way to
+ configure a network device for your system, refer to your distribution
+ documentation. If during this process you are asked for the driver or module
+ name, the name for the Linux Base Driver for the Intel 10GbE Family of
+ Adapters is ixgb.
+
+ Viewing Link Messages
+ ---------------------
+ Link messages will not be displayed to the console if the distribution is
+ restricting system messages. In order to see network driver link messages on
+ your console, set dmesg to eight by entering the following:
+
+ dmesg -n 8
+
+ NOTE: This setting is not saved across reboots.
+
+
+ Jumbo Frames
+ ------------
+ The driver supports Jumbo Frames for all adapters. Jumbo Frames support is
+ enabled by changing the MTU to a value larger than the default of 1500.
+ The maximum value for the MTU is 16114. Use the ifconfig command to
+ increase the MTU size. For example:
+
+ ifconfig ethx mtu 9000 up
+
+ The maximum MTU setting for Jumbo Frames is 16114. This value coincides
+ with the maximum Jumbo Frames size of 16128.
+
+
+ Ethtool
+ -------
+ The driver utilizes the ethtool interface for driver configuration and
+ diagnostics, as well as displaying statistical information. Ethtool
+ version 1.6 or later is required for this functionality.
+
+ The latest release of ethtool can be found from
+ http://sourceforge.net/projects/gkernel
+
+ NOTE: Ethtool 1.6 only supports a limited set of ethtool options. Support
+ for a more complete ethtool feature set can be enabled by upgrading
+ to the latest version.
+
+
+ NAPI
+ ----
+
+ NAPI (Rx polling mode) is supported in the ixgb driver. NAPI is enabled
+ or disabled based on the configuration of the kernel. see CONFIG_IXGB_NAPI
+
+ See www.cyberus.ca/~hadi/usenix-paper.tgz for more information on NAPI.
+
+
+Known Issues/Troubleshooting
+============================
+
+ NOTE: After installing the driver, if your Intel Network Connection is not
+ working, verify in the "In This Release" section of the readme that you have
+ installed the correct driver.
+
+ Intel(R) PRO/10GbE CX4 Server Adapter Cable Interoperability Issue with
+ Fujitsu XENPAK Module in SmartBits Chassis
+ ---------------------------------------------------------------------
+ Excessive CRC errors may be observed if the Intel(R) PRO/10GbE CX4
+ Server adapter is connected to a Fujitsu XENPAK CX4 module in a SmartBits
+ chassis using 15 m/24AWG cable assemblies manufactured by Fujitsu or Leoni.
+ The CRC errors may be received either by the Intel(R) PRO/10GbE CX4
+ Server adapter or the SmartBits. If this situation occurs using a different
+ cable assembly may resolve the issue.
+
+ CX4 Server Adapter Cable Interoperability Issues with HP Procurve 3400cl
+ Switch Port
+ ------------------------------------------------------------------------
+ Excessive CRC errors may be observed if the Intel(R) PRO/10GbE CX4 Server
+ adapter is connected to an HP Procurve 3400cl switch port using short cables
+ (1 m or shorter). If this situation occurs, using a longer cable may resolve
+ the issue.
+
+ Excessive CRC errors may be observed using Fujitsu 24AWG cable assemblies that
+ Are 10 m or longer or where using a Leoni 15 m/24AWG cable assembly. The CRC
+ errors may be received either by the CX4 Server adapter or at the switch. If
+ this situation occurs, using a different cable assembly may resolve the issue.
+
+
+ Jumbo Frames System Requirement
+ -------------------------------
+ Memory allocation failures have been observed on Linux systems with 64 MB
+ of RAM or less that are running Jumbo Frames. If you are using Jumbo
+ Frames, your system may require more than the advertised minimum
+ requirement of 64 MB of system memory.
+
+
+ Performance Degradation with Jumbo Frames
+ -----------------------------------------
+ Degradation in throughput performance may be observed in some Jumbo frames
+ environments. If this is observed, increasing the application's socket buffer
+ size and/or increasing the /proc/sys/net/ipv4/tcp_*mem entry values may help.
+ See the specific application manual and /usr/src/linux*/Documentation/
+ networking/ip-sysctl.txt for more details.
+
+
+ Allocating Rx Buffers when Using Jumbo Frames
+ ---------------------------------------------
+ Allocating Rx buffers when using Jumbo Frames on 2.6.x kernels may fail if
+ the available memory is heavily fragmented. This issue may be seen with PCI-X
+ adapters or with packet split disabled. This can be reduced or eliminated
+ by changing the amount of available memory for receive buffer allocation, by
+ increasing /proc/sys/vm/min_free_kbytes.
+
+
+ Multiple Interfaces on Same Ethernet Broadcast Network
+ ------------------------------------------------------
+ Due to the default ARP behavior on Linux, it is not possible to have
+ one system on two IP networks in the same Ethernet broadcast domain
+ (non-partitioned switch) behave as expected. All Ethernet interfaces
+ will respond to IP traffic for any IP address assigned to the system.
+ This results in unbalanced receive traffic.
+
+ If you have multiple interfaces in a server, do either of the following:
+
+ - Turn on ARP filtering by entering:
+ echo 1 > /proc/sys/net/ipv4/conf/all/arp_filter
+
+ - Install the interfaces in separate broadcast domains - either in
+ different switches or in a switch partitioned to VLANs.
+
+
+ UDP Stress Test Dropped Packet Issue
+ --------------------------------------
+ Under small packets UDP stress test with 10GbE driver, the Linux system
+ may drop UDP packets due to the fullness of socket buffers. You may want
+ to change the driver's Flow Control variables to the minimum value for
+ controlling packet reception.
+
+
+ Tx Hangs Possible Under Stress
+ ------------------------------
+ Under stress conditions, if TX hangs occur, turning off TSO
+ "ethtool -K eth0 tso off" may resolve the problem.
+
+
Support
=======
-For general information and support, go to the Intel support website at:
+For general information, go to the Intel support website at:
http://support.intel.com
+or the Intel Wired Networking project hosted by Sourceforge at:
+
+ http://sourceforge.net/projects/e1000
+
If an issue is identified with the released source code on the supported
-kernel with a supported adapter, email the specific information related to
-the issue to linux.nics@intel.com.
+kernel with a supported adapter, email the specific information related
+to the issue to e1000-devel@lists.sf.net
diff --git a/Documentation/networking/mac80211_hwsim/README b/Documentation/networking/mac80211_hwsim/README
new file mode 100644
index 00000000000..2ff8ccb8dc3
--- /dev/null
+++ b/Documentation/networking/mac80211_hwsim/README
@@ -0,0 +1,67 @@
+mac80211_hwsim - software simulator of 802.11 radio(s) for mac80211
+Copyright (c) 2008, Jouni Malinen <j@w1.fi>
+
+This program is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License version 2 as
+published by the Free Software Foundation.
+
+
+Introduction
+
+mac80211_hwsim is a Linux kernel module that can be used to simulate
+arbitrary number of IEEE 802.11 radios for mac80211. It can be used to
+test most of the mac80211 functionality and user space tools (e.g.,
+hostapd and wpa_supplicant) in a way that matches very closely with
+the normal case of using real WLAN hardware. From the mac80211 view
+point, mac80211_hwsim is yet another hardware driver, i.e., no changes
+to mac80211 are needed to use this testing tool.
+
+The main goal for mac80211_hwsim is to make it easier for developers
+to test their code and work with new features to mac80211, hostapd,
+and wpa_supplicant. The simulated radios do not have the limitations
+of real hardware, so it is easy to generate an arbitrary test setup
+and always reproduce the same setup for future tests. In addition,
+since all radio operation is simulated, any channel can be used in
+tests regardless of regulatory rules.
+
+mac80211_hwsim kernel module has a parameter 'radios' that can be used
+to select how many radios are simulated (default 2). This allows
+configuration of both very simply setups (e.g., just a single access
+point and a station) or large scale tests (multiple access points with
+hundreds of stations).
+
+mac80211_hwsim works by tracking the current channel of each virtual
+radio and copying all transmitted frames to all other radios that are
+currently enabled and on the same channel as the transmitting
+radio. Software encryption in mac80211 is used so that the frames are
+actually encrypted over the virtual air interface to allow more
+complete testing of encryption.
+
+A global monitoring netdev, hwsim#, is created independent of
+mac80211. This interface can be used to monitor all transmitted frames
+regardless of channel.
+
+
+Simple example
+
+This example shows how to use mac80211_hwsim to simulate two radios:
+one to act as an access point and the other as a station that
+associates with the AP. hostapd and wpa_supplicant are used to take
+care of WPA2-PSK authentication. In addition, hostapd is also
+processing access point side of association.
+
+Please note that the current Linux kernel does not enable AP mode, so a
+simple patch is needed to enable AP mode selection:
+http://johannes.sipsolutions.net/patches/kernel/all/LATEST/006-allow-ap-vlan-modes.patch
+
+
+# Build mac80211_hwsim as part of kernel configuration
+
+# Load the module
+modprobe mac80211_hwsim
+
+# Run hostapd (AP) for wlan0
+hostapd hostapd.conf
+
+# Run wpa_supplicant (station) for wlan1
+wpa_supplicant -Dwext -iwlan1 -c wpa_supplicant.conf
diff --git a/Documentation/networking/mac80211_hwsim/hostapd.conf b/Documentation/networking/mac80211_hwsim/hostapd.conf
new file mode 100644
index 00000000000..08cde7e35f2
--- /dev/null
+++ b/Documentation/networking/mac80211_hwsim/hostapd.conf
@@ -0,0 +1,11 @@
+interface=wlan0
+driver=nl80211
+
+hw_mode=g
+channel=1
+ssid=mac80211 test
+
+wpa=2
+wpa_key_mgmt=WPA-PSK
+wpa_pairwise=CCMP
+wpa_passphrase=12345678
diff --git a/Documentation/networking/mac80211_hwsim/wpa_supplicant.conf b/Documentation/networking/mac80211_hwsim/wpa_supplicant.conf
new file mode 100644
index 00000000000..299128cff03
--- /dev/null
+++ b/Documentation/networking/mac80211_hwsim/wpa_supplicant.conf
@@ -0,0 +1,10 @@
+ctrl_interface=/var/run/wpa_supplicant
+
+network={
+ ssid="mac80211 test"
+ psk="12345678"
+ key_mgmt=WPA-PSK
+ proto=WPA2
+ pairwise=CCMP
+ group=CCMP
+}
diff --git a/Documentation/networking/multiqueue.txt b/Documentation/networking/multiqueue.txt
index ea5a42e8f79..d391ea63114 100644
--- a/Documentation/networking/multiqueue.txt
+++ b/Documentation/networking/multiqueue.txt
@@ -3,19 +3,11 @@
===========================================
Section 1: Base driver requirements for implementing multiqueue support
-Section 2: Qdisc support for multiqueue devices
-Section 3: Brief howto using PRIO or RR for multiqueue devices
-
Intro: Kernel support for multiqueue devices
---------------------------------------------------------
-Kernel support for multiqueue devices is only an API that is presented to the
-netdevice layer for base drivers to implement. This feature is part of the
-core networking stack, and all network devices will be running on the
-multiqueue-aware stack. If a base driver only has one queue, then these
-changes are transparent to that driver.
-
+Kernel support for multiqueue devices is always present.
Section 1: Base driver requirements for implementing multiqueue support
-----------------------------------------------------------------------
@@ -32,84 +24,4 @@ netif_{start|stop|wake}_subqueue() functions to manage each queue while the
device is still operational. netdev->queue_lock is still used when the device
comes online or when it's completely shut down (unregister_netdev(), etc.).
-Finally, the base driver should indicate that it is a multiqueue device. The
-feature flag NETIF_F_MULTI_QUEUE should be added to the netdev->features
-bitmap on device initialization. Below is an example from e1000:
-
-#ifdef CONFIG_E1000_MQ
- if ( (adapter->hw.mac.type == e1000_82571) ||
- (adapter->hw.mac.type == e1000_82572) ||
- (adapter->hw.mac.type == e1000_80003es2lan))
- netdev->features |= NETIF_F_MULTI_QUEUE;
-#endif
-
-
-Section 2: Qdisc support for multiqueue devices
------------------------------------------------
-
-Currently two qdiscs support multiqueue devices. A new round-robin qdisc,
-sch_rr, and sch_prio. The qdisc is responsible for classifying the skb's to
-bands and queues, and will store the queue mapping into skb->queue_mapping.
-Use this field in the base driver to determine which queue to send the skb
-to.
-
-sch_rr has been added for hardware that doesn't want scheduling policies from
-software, so it's a straight round-robin qdisc. It uses the same syntax and
-classification priomap that sch_prio uses, so it should be intuitive to
-configure for people who've used sch_prio.
-
-In order to utilitize the multiqueue features of the qdiscs, the network
-device layer needs to enable multiple queue support. This can be done by
-selecting NETDEVICES_MULTIQUEUE under Drivers.
-
-The PRIO qdisc naturally plugs into a multiqueue device. If
-NETDEVICES_MULTIQUEUE is selected, then on qdisc load, the number of
-bands requested is compared to the number of queues on the hardware. If they
-are equal, it sets a one-to-one mapping up between the queues and bands. If
-they're not equal, it will not load the qdisc. This is the same behavior
-for RR. Once the association is made, any skb that is classified will have
-skb->queue_mapping set, which will allow the driver to properly queue skb's
-to multiple queues.
-
-
-Section 3: Brief howto using PRIO and RR for multiqueue devices
----------------------------------------------------------------
-
-The userspace command 'tc,' part of the iproute2 package, is used to configure
-qdiscs. To add the PRIO qdisc to your network device, assuming the device is
-called eth0, run the following command:
-
-# tc qdisc add dev eth0 root handle 1: prio bands 4 multiqueue
-
-This will create 4 bands, 0 being highest priority, and associate those bands
-to the queues on your NIC. Assuming eth0 has 4 Tx queues, the band mapping
-would look like:
-
-band 0 => queue 0
-band 1 => queue 1
-band 2 => queue 2
-band 3 => queue 3
-
-Traffic will begin flowing through each queue if your TOS values are assigning
-traffic across the various bands. For example, ssh traffic will always try to
-go out band 0 based on TOS -> Linux priority conversion (realtime traffic),
-so it will be sent out queue 0. ICMP traffic (pings) fall into the "normal"
-traffic classification, which is band 1. Therefore pings will be send out
-queue 1 on the NIC.
-
-Note the use of the multiqueue keyword. This is only in versions of iproute2
-that support multiqueue networking devices; if this is omitted when loading
-a qdisc onto a multiqueue device, the qdisc will load and operate the same
-if it were loaded onto a single-queue device (i.e. - sends all traffic to
-queue 0).
-
-Another alternative to multiqueue band allocation can be done by using the
-multiqueue option and specify 0 bands. If this is the case, the qdisc will
-allocate the number of bands to equal the number of queues that the device
-reports, and bring the qdisc online.
-
-The behavior of tc filters remains the same, where it will override TOS priority
-classification.
-
-
Author: Peter P. Waskiewicz Jr. <peter.p.waskiewicz.jr@intel.com>
diff --git a/Documentation/networking/packet_mmap.txt b/Documentation/networking/packet_mmap.txt
index db0cd516958..07c53d59603 100644
--- a/Documentation/networking/packet_mmap.txt
+++ b/Documentation/networking/packet_mmap.txt
@@ -326,7 +326,7 @@ just one call to mmap is needed:
mmap(0, size, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
If tp_frame_size is a divisor of tp_block_size frames will be
-contiguosly spaced by tp_frame_size bytes. If not, each
+contiguously spaced by tp_frame_size bytes. If not, each
tp_block_size/tp_frame_size frames there will be a gap between
the frames. This is because a frame cannot be spawn across two
blocks.
diff --git a/Documentation/networking/s2io.txt b/Documentation/networking/s2io.txt
index 4bde53e85f3..c3d6b4d5d01 100644
--- a/Documentation/networking/s2io.txt
+++ b/Documentation/networking/s2io.txt
@@ -52,13 +52,10 @@ d. MSI/MSI-X. Can be enabled on platforms which support this feature
(IA64, Xeon) resulting in noticeable performance improvement(upto 7%
on certain platforms).
-e. NAPI. Compile-time option(CONFIG_S2IO_NAPI) for better Rx interrupt
-moderation.
-
-f. Statistics. Comprehensive MAC-level and software statistics displayed
+e. Statistics. Comprehensive MAC-level and software statistics displayed
using "ethtool -S" option.
-g. Multi-FIFO/Ring. Supports up to 8 transmit queues and receive rings,
+f. Multi-FIFO/Ring. Supports up to 8 transmit queues and receive rings,
with multiple steering options.
4. Command line parameters
@@ -83,9 +80,9 @@ Valid range: Limited by memory on system
Default: 30
e. intr_type
-Specifies interrupt type. Possible values 1(INTA), 2(MSI), 3(MSI-X)
-Valid range: 1-3
-Default: 1
+Specifies interrupt type. Possible values 0(INTA), 2(MSI-X)
+Valid values: 0, 2
+Default: 2
5. Performance suggestions
General:
diff --git a/Documentation/networking/tc-actions-env-rules.txt b/Documentation/networking/tc-actions-env-rules.txt
index 01e716d185f..dcadf6f88e3 100644
--- a/Documentation/networking/tc-actions-env-rules.txt
+++ b/Documentation/networking/tc-actions-env-rules.txt
@@ -4,26 +4,27 @@ The "enviromental" rules for authors of any new tc actions are:
1) If you stealeth or borroweth any packet thou shalt be branching
from the righteous path and thou shalt cloneth.
-For example if your action queues a packet to be processed later
-or intentionaly branches by redirecting a packet then you need to
+For example if your action queues a packet to be processed later,
+or intentionally branches by redirecting a packet, then you need to
clone the packet.
+
There are certain fields in the skb tc_verd that need to be reset so we
-avoid loops etc. A few are generic enough so much so that skb_act_clone()
-resets them for you. So invoke skb_act_clone() rather than skb_clone()
+avoid loops, etc. A few are generic enough that skb_act_clone()
+resets them for you, so invoke skb_act_clone() rather than skb_clone().
2) If you munge any packet thou shalt call pskb_expand_head in the case
someone else is referencing the skb. After that you "own" the skb.
You must also tell us if it is ok to munge the packet (TC_OK2MUNGE),
this way any action downstream can stomp on the packet.
-3) dropping packets you dont own is a nono. You simply return
+3) Dropping packets you don't own is a no-no. You simply return
TC_ACT_SHOT to the caller and they will drop it.
The "enviromental" rules for callers of actions (qdiscs etc) are:
-*) thou art responsible for freeing anything returned as being
+*) Thou art responsible for freeing anything returned as being
TC_ACT_SHOT/STOLEN/QUEUED. If none of TC_ACT_SHOT/STOLEN/QUEUED is
-returned then all is great and you dont need to do anything.
+returned, then all is great and you don't need to do anything.
Post on netdev if something is unclear.
diff --git a/Documentation/networking/udplite.txt b/Documentation/networking/udplite.txt
index 3870f280280..855d8da57a2 100644
--- a/Documentation/networking/udplite.txt
+++ b/Documentation/networking/udplite.txt
@@ -148,7 +148,7 @@
getsockopt(sockfd, SOL_SOCKET, SO_NO_CHECK, &value, ...);
is meaningless (as in TCP). Packets with a zero checksum field are
- illegal (cf. RFC 3828, sec. 3.1) will be silently discarded.
+ illegal (cf. RFC 3828, sec. 3.1) and will be silently discarded.
4) Fragmentation