aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2009-09-04sctp: Sysctl configuration for IPv4 Address ScopingBhaskar Dutta
This patch introduces a new sysctl option to make IPv4 Address Scoping configurable <draft-stewart-tsvwg-sctp-ipv4-00.txt>. In networking environments where DNAT rules in iptables prerouting chains convert destination IP's to link-local/private IP addresses, SCTP connections fail to establish as the INIT chunk is dropped by the kernel due to address scope match failure. For example to support overlapping IP addresses (same IP address with different vlan id) a Layer-5 application listens on link local IP's, and there is a DNAT rule that maps the destination IP to a link local IP. Such applications never get the SCTP INIT if the address-scoping draft is strictly followed. This sysctl configuration allows SCTP to function in such unconventional networking environments. Sysctl options: 0 - Disable IPv4 address scoping draft altogether 1 - Enable IPv4 address scoping (default, current behavior) 2 - Enable address scoping but allow IPv4 private addresses in init/init-ack 3 - Enable address scoping but allow IPv4 link local address in init/init-ack Signed-off-by: Bhaskar Dutta <bhaskar.dutta@globallogic.com> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2009-09-04sctp: Get rid of an extra routing lookup when adding a transport.Vlad Yasevich
We used to perform 2 routing lookups for a new transport: one just for path mtu detection, and one to actually route to destination and path mtu update when sending a packet. There is no point in doing both of them, especially since the first one just for path mtu doesn't take into account source address and sometimes gives the wrong route, causing path mtu updates anyway. We now do just the one call to do both route to destination and get path mtu updates. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2009-09-04sctp: Turn flags in 'sctp_packet' into bit fieldsVlad Yasevich
This shrinks the size of sctp_packet a little. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2009-09-04sctp: Correctly track if AUTH has been bundled.Vlad Yasevich
We currently track if AUTH has been bundled using the 'auth' pointer to the chunk. However, AUTH is disallowed after DATA is already in the packet, so we need to instead use the 'has_auth' field. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2009-09-04sctp: fix to reset packet information after packet transmitWei Yongjun
The packet information does not reset after packet transmit, this may cause some problems such as following DATA chunk be sent without AUTH chunk, even if the authentication of DATA chunk has been requested by the peer. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2009-09-04sctp: Failover transmitted list on transport deleteVlad Yasevich
Add-IP feature allows users to delete an active transport. If that transport has chunks in flight, those chunks need to be moved to another transport or association may get into unrecoverable state. Reported-by: Rafael Laufer <rlaufer@cisco.com> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2009-09-04sctp: Fix SCTP_MAXSEG socket option to comply to spec.Vlad Yasevich
We had a bug that we never stored the user-defined value for MAXSEG when setting the value on an association. Thus future PMTU events ended up re-writing the frag point and increasing it past user limit. Additionally, when setting the option on the socket/endpoint, we effect all current associations, which is against spec. Now, we store the user 'maxseg' value along with the computed 'frag_point'. We inherit 'maxseg' from the socket at association creation and use it as an upper limit for 'frag_point' when its set. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2009-09-04sctp: Don't do NAGLE delay on large writes that were fragmented smallVlad Yasevich
SCTP will delay the last part of a large write due to NAGLE, if that part is smaller then MTU. Since we are doing large writes, we might as well send the last portion now instead of waiting untill the next large write happens. The small portion will be sent as is regardless, so it's better to not delay it. This is a result of much discussions with Wei Yongjun <yjwei@cn.fujitsu.com> and Doug Graham <dgraham@nortel.com>. Many thanks go out to them. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2009-09-04sctp: Nagle delay should be based on path mtuVlad Yasevich
The decision to delay due to Nagle should be based on the path mtu and future packet size. We currently incorrectly base it on 'frag_point' which is the SCTP DATA segment size, and also we do not count DATA chunk header overhead in the computation. This actuall allows situations where a user can set low 'frag_point', and then send small messages without delay. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2009-09-04sctp: Try not to change a_rwnd when faking a SACK from SHUTDOWN.Vlad Yasevich
We currently set a_rwnd to 0 when faking a SACK from SHUTDOWN. This results in an hung association if the remote only uses SHUTDOWNs (which it's allowed to do) to acknowlege DATA when closing. The reason for that is that we simply honor the a_rwnd from the sack, but since we faked it to be 0, we enter 0-window probing. The fix is to use the peers old rwnd and add our flight size to it. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2009-09-04sctp: drop a_rwnd to 0 when receive buffer overflows.Vlad Yasevich
SCTP has a problem that when small chunks are used, it is possible to exhaust the receiver buffer without fully closing receive window. This happens due to all overhead that we have account for with small messages. To fix this, when receive buffer is exceeded, we'll drop the window to 0 and save the 'drop' portion. When application starts reading data and freeing up recevie buffer space, we'll wait until we've reached the 'drop' window and then add back this 'drop' one mtu at a time. This worked well in testing and under stress produced rather even recovery. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2009-09-04sctp: Clear fast_recovery on the transport when T3 timer expires.Vlad Yasevich
If T3 timer expires, we are retransmitting data due to timeout any any fast recovery is null and void. We can clear the fast recovery flag. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2009-09-04sctp: Fix error count increments that were results of HEARTBEATSVlad Yasevich
SCTP RFC 4960 states that unacknowledged HEARTBEATS count as errors agains a given transport or endpoint. As such, we should increment the error counts for only for unacknowledged HB, otherwise we detect failure too soon. This goes for both the overall error count and the path error count. Now, there is a difference in how the detection is done between the two. The path error detection is done after the increment, so to detect it properly, we actually need to exceed the path threshold. The overall error detection is done _BEFORE_ the increment. Thus to detect the failure, it's enough for the error count to match the threshold. This is why all the state functions use '>=' to detect failure, while path detection uses '>'. Thanks goes to Chunbo Luo <chunbo.luo@windriver.com> who first proposed patches to fix this issue and made me re-read the spec and the code to figure out how this cruft really works. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2009-09-04sctp: use proc_create()Alexey Dobriyan
create_proc_entry() is deprecated (not formally, though). Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2009-09-04sctp: fix check the chunk length of received HEARTBEAT-ACK chunkWei Yongjun
The receiver of the HEARTBEAT should respond with a HEARTBEAT ACK that contains the Heartbeat Information field copied from the received HEARTBEAT chunk. So the received HEARTBEAT-ACK chunk must have a length of: sizeof(sctp_chunkhdr_t) + sizeof(sctp_sender_hb_info_t) A badly formatted HB-ACK chunk, it is possible that we may access invalid memory. We should really make sure that the chunk format is what we expect, before attempting to touch the data. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2009-09-04sctp: drop SHUTDOWN chunk if the TSN is less than the CTSNWei Yongjun
If Cumulative TSN Ack field of SHUTDOWN chunk is less than the Cumulative TSN Ack Point then drop the SHUTDOWN chunk. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2009-09-04sctp: Send user messages to the lower layer as oneVlad Yasevich
Currenlty, sctp breaks up user messages into fragments and sends each fragment to the lower layer by itself. This means that for each fragment we go all the way down the stack and back up. This also discourages bundling of multiple fragments when they can fit into a sigle packet (ex: due to user setting a low fragmentation threashold). We introduce a new command SCTP_CMD_SND_MSG and hand the whole message down state machine. The state machine and the side-effect parser will cork the queue, add all chunks from the message to the queue, and then un-cork the queue thus causing the chunks to get transmitted. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2009-09-04sctp: Try to encourage SACK bundling with DATA.Vlad Yasevich
If the association has a SACK timer pending and now DATA queued to be send, we'll try to bundle the SACK with the next application send. As such, try encourage bundling by accounting for SACK in the size of the first chunk fragment. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2009-09-04sctp: Generate SACKs when actually sending outbound DATAVlad Yasevich
We are now trying to bundle SACKs when we have outbound DATA to send. However, there are situations where this outbound DATA will not be sent (due to congestion or available window). In such cases it's ok to wait for the timer to expire. This patch refactors the sending code so that betfore attempting to bundle the SACK we check to see if the DATA will actually be transmitted. Based on eirlier works for Doug Graham <dgraham@nortel.com> and Wei Youngjun <yjwei@cn.fujitsu.com>. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2009-09-04sctp: Fix data segmentation with small frag_sizeVlad Yasevich
Since an application may specify the maximum SCTP fragment size that all data should be fragmented to, we need to fix how we do segmentation. Right now, if a user specifies a small fragment size, the segment size can go negative in the presence of AUTH or COOKIE_ECHO bundling. What we need to do is track the largest possbile DATA chunk that can fit into the mtu. Then if the fragment size specified is bigger then this maximum length, we'll shrink it down. Otherwise, we just use the smaller segment size without changing it further. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2009-09-04sctp: Disallow new connection on a closing socketVlad Yasevich
If a socket has a lot of association that are in the process of of being closed/aborted, it is possible for a remote to establish new associations during the time period that the old ones are shutting down. If this was a result of a close() call, there will be no socket and will cause a memory leak. We'll prevent this by setting the socket state to CLOSING and disallow new associations when in this state. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2009-09-04sctp: Fix piggybacked ACKsDoug Graham
This patch corrects the conditions under which a SACK will be piggybacked on a DATA packet. The previous condition was incorrect due to a misinterpretation of RFC 4960 and/or RFC 2960. Specifically, the following paragraph from section 6.2 had not been implemented correctly: Before an endpoint transmits a DATA chunk, if any received DATA chunks have not been acknowledged (e.g., due to delayed ack), the sender should create a SACK and bundle it with the outbound DATA chunk, as long as the size of the final SCTP packet does not exceed the current MTU. See Section 6.2. When about to send a DATA chunk, the code now checks to see if the SACK timer is running. If it is, we know we have a SACK to send to the peer, so we append the SACK (assuming available space in the packet) and turn off the timer. For a simple request-response scenario, this will result in the SACK being bundled with the response, meaning the the SACK is received quickly by the client, and also meaning that no separate SACK packet needs to be sent by the server to acknowledge the request. Prior to this patch, a separate SACK packet would have been sent by the server SCTP only after its delayed-ACK timer had expired (usually 200ms). This is wasteful of bandwidth, and can also have a major negative impact on performance due the interaction of delayed ACKs with the Nagle algorithm. Signed-off-by: Doug Graham <dgraham@nortel.com> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2009-09-04sctp: remove unused union (sctp_cmsg_data_t) definitionRami Rosen
This patch removes an unused union definition (sctp_cmsg_data_t) from include/net/sctp/user.h. Signed-off-by: Rami Rosen <rosenrami@gmail.com> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2009-09-04sctp: release cached route when the transport goes down.Vlad Yasevich
When the sctp transport is marked down, we can release the cached route and force a new lookup when attempting to use this transport for anything. This way, if a better route or source address is available, we'll try to use it. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2009-09-04sctp: update the route for non-active transports after addresses are addedWei Yongjun
Update the route and saddr entries for the non-active transports as some of the added addresses can be used as better source addresses, or may be there is a better route. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2009-09-04sctp: check the unrecognized ASCONF parameter before access itWei Yongjun
This patch fix to check the unrecognized ASCONF parameter before access it. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2009-09-04sctp: avoid overwrite the return value of sctp_process_asconf_ack()Wei Yongjun
The return value of sctp_process_asconf_ack() may be overwritten while process parameters with no error. This patch fixed the problem. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2009-09-04net: Fix a build break because of a typo in drivers/net/3c503.cSachin Sant
Signed-off-by: Sachin Sant <sachinp@in.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-04can: sja1000: legacy SJA1000 ISA bus driverWolfgang Grandegger
This patch adds support for legacy SJA1000 CAN controllers on the ISA or PC-104 bus. The I/O port or memory address and the IRQ number must be specified via module parameters: insmod sja1000_isa.ko port=0x310,0x380 irq=7,11 for ISA devices using I/O ports or: insmod sja1000_isa.ko mem=0xd1000,0xd1000 irq=7,11 for memory mapped ISA devices. Indirect access via address and data port is supported as well: insmod sja1000_isa.ko port=0x310,0x380 indirect=1 irq=7,11 Here is a full list of the supported module parameters: port:I/O port number (array of ulong) mem:I/O memory address (array of ulong) indirect:Indirect access via address and data port (array of byte) irq:IRQ number (array of int) clk:External oscillator clock frequency (default=16000000 [16 MHz]) (array of int) cdr:Clock divider register (default=0x48 [CDR_CBP | CDR_CLK_OFF]) (array of byte) ocr:Output clock register (default=0x18 [OCR_TX0_PUSHPULL]) (array of byte) Note: for clk, cdr, ocr, the first argument re-defines the default for all other devices, e.g.: insmod sja1000_isa.ko mem=0xd1000,0xd1000 irq=7,11 clk=24000000 is equivalent to insmod sja1000_isa.ko mem=0xd1000,0xd1000 irq=7,11 \ clk=24000000,24000000 Signed-off-by: Wolfgang Grandegger <wg@grandegger.com> Tested-by: Oliver Hartkopp <oliver@hartkopp.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-04can: sja1000: fix network statistics updateWolfgang Grandegger
The member "tx_bytes" of "struct net_device_stats" should be incremented when the interrupt is done and an "arbitration lost error" is a TX error and the statistics should be updated accordingly. Signed-off-by: Wolfgang Grandegger <wg@grandegger.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-04can: add can_free_echo_skb() for upcoming driversWolfgang Grandegger
This patch adds the function can_free_echo_skb to the CAN device interface to allow upcoming drivers to release echo skb's in case of error. Signed-off-by: Wolfgang Grandegger <wg@grandegger.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-03WAN: dscc4: Fix warning pointing out a bug.David S. Miller
Noticed by Stephen Rothwell: Today's linux-next build (x86_64 allmodconfig gcc-4.4.0) produced this warning: drivers/net/wan/dscc4.c: In function 'dscc4_rx_skb': drivers/net/wan/dscc4.c:670: warning: suggest parentheses around comparison in operand of '|' which actually points out a bug, I think. It is doing (x & (y | z)) != y | z when it probably means (x & (y | z)) != (y | z) Introduced by commit 5de3fcab91b0e1809eec030355d15801daf25083 ("WAN: bit and/or confusion"). Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-03ipv6: Fix tcp_v6_send_response(): it didn't set skb transport headerCosmin Ratiu
Here is a patch which fixes an issue observed when using TCP over IPv6 and AH from IPsec. When a connection gets closed the 4-way method and the last ACK from the server gets dropped, the subsequent FINs from the client do not get ACKed because tcp_v6_send_response does not set the transport header pointer. This causes ah6_output to try to allocate a lot of memory, which typically fails, so the ACKs never make it out of the stack. I have reproduced the problem on kernel 2.6.7, but after looking at the latest kernel it seems the problem is still there. Signed-off-by: Cosmin Ratiu <cratiu@ixiacom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-03enic: organize device initialization/deinit into separate functionsScott Feldman
To unclutter probe() a little bit, put all device initialization code in one spot and device deinit code in another spot. Also remove unused rq->buf_index variable/func. Signed-off-by: Scott Feldman <scofeldm@cisco.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-03enic: bug fix: check for zero port MTU before posting warningScott Feldman
Nic firmware can return zero for port MTU, so check for non-zero value before checking for change in port MTU. Signed-off-by: Scott Feldman <scofeldm@cisco.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-03enic: changes to driver/firmware interfaceScott Feldman
Deprecate some old APIa; change arguments to stats dump all API; add new interrupt assert API Signed-off-by: Scott Feldman <scofeldm@cisco.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-03enic: bug fix: enable VLAN filteringScott Feldman
Bug fix: enable VLAN filtering Signed-off-by: Scott Feldman <scofeldm@cisco.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-03enic: provision for multiple Rx/Tx queues; prepare for RSS supportScott Feldman
Provision for multiple Rx/Tx queues. Max of 8 WQs and 8 RQs. Max for completion queue is 8+8=16 and max for interrupt resources is 8+8+2. Add driver/firmware interface for setting up RSS secret key and indirection table. Signed-off-by: Scott Feldman <scofeldm@cisco.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-03enic: bug fix: included MAC drops in rx_dropped netstatScott Feldman
Bug fix: included MAC drops in rx_dropped netstat. Also track Rx trunctations stat at the MAC Signed-off-by: Scott Feldman <scofeldm@cisco.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-03enic: bug fix: protect fw call i/f with spinlockScott Feldman
Some driver -> nic firmware calls weren't guarded with a spinlock, exposing the call i/f to a race between two threads Signed-off-by: Scott Feldman <scofeldm@cisco.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-03enic: use netdev_alloc_skbScott Feldman
Use netdev_alloc_skb rather than dev_alloc_skb Signed-off-by: Scott Feldman <scofeldm@cisco.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-03enic: bug fix: split TSO fragments larger than 16K into multiple descsScott Feldman
enic WQ desc supports a maximum 16K buf size, so split any send fragments larger than 16K into several descs. Signed-off-by: Scott Feldman <scofeldm@cisco.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-03enic: workaround A0 erratumScott Feldman
A0 revision ASIC has an erratum on the RQ desc cache on chip where the cache can become corrupted causing pkt buf writes to wrong locations. The s/w workaround is to post a dummy RQ desc in the ring every 32 descs, causing a flush of the cache. A0 parts are not production, but there are enough of these parts in the wild in test setups to warrant including workaround. A1 revision ASIC parts fix erratum. Signed-off-by: Scott Feldman <scofeldm@cisco.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-03enic: add support for multiple BARsScott Feldman
Nic firmware can place resources (queues, intrs, etc) on multiple BARs, so allow driver to discover/map resources beyond BAR0. Signed-off-by: Scott Feldman <scofeldm@cisco.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-03vlan: adds drops accountingEric Dumazet
Its hard to tell if vlans are dropping frames, since every frame given to vlan_???_start_xmit() functions is accounted as fully transmitted by lower device. We can test dev_queue_xmit() return values to properly account for dropped frames. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-03macvlan: add multiqueue capabilityEric Dumazet
macvlan devices are currently not multi-queue capable. We can do that defining rtnl_link_ops method, get_tx_queues(), called from rtnl_create_link() This new method gets num_tx_queues/real_num_tx_queues from lower device. macvlan_get_tx_queues() is a copy of vlan_get_tx_queues(). Because macvlan_start_xmit() has to update netdev_queue stats only (and not dev->stats), I chose to change tx_errors/tx_aborted_errors accounting to tx_dropped, since netdev_queue structure doesnt define tx_errors / tx_aborted_errors. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-03netdev: Convert MDIO ioctl implementation to use struct mii_ioctl_dataBen Hutchings
A few drivers still access the arguments to MDIO ioctls as an array of u16. Convert them to use struct mii_ioctl_data. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-03netdev: Remove redundant checks for CAP_NET_ADMIN in MDIO implementationsBen Hutchings
dev_ioctl() already checks capable(CAP_NET_ADMIN) before calling the driver's implementation of MDIO ioctls. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-03netdev: Remove SIOCDEVPRIVATE aliases for MDIO ioctlsBen Hutchings
The standard MDIO ioctl numbers are well-established and these should no longer be needed. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-03sky2: only enable Vaux if capable of wakeupStephen Hemminger
While perusing vendor driver, I saw that it did not enable the Vaux power unless device was able to wake from lan for D3cold. This might help for Rene's power issue. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>