aboutsummaryrefslogtreecommitdiff
path: root/net/ipv4
AgeCommit message (Collapse)Author
2008-12-15netfilter: update rwlock initialization for nat_tableSteven Rostedt
The commit e099a173573ce1ba171092aee7bb3c72ea686e59 (netfilter: netns nat: per-netns NAT table) renamed the nat_table from __nat_table to nat_table without updating the __RW_LOCK_UNLOCKED(__nat_table.lock). Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-09tcp: tcp_vegas cong avoid fix Doug Leith
This patch addresses a book-keeping issue in tcp_vegas.c. At present tcp_vegas does separate book-keeping of cwnd based on packet sequence numbers. A mismatch can develop between this book-keeping and tp->snd_cwnd due, for example, to delayed acks acking multiple packets. When vegas transitions to reno operation (e.g. following loss), then this mismatch leads to incorrect behaviour (akin to a cwnd backoff). This seems mostly to affect operation at low cwnds where delayed acking can lead to a significant fraction of cwnd being covered by a single ack, leading to the book-keeping mismatch. This patch modifies the congestion avoidance update to avoid the need for separate book-keeping while leaving vegas congestion avoidance functionally unchanged. A secondary advantage of this modification is that the use of fixed-point (via V_PARAM_SHIFT) and 64 bit arithmetic is no longer necessary, simplifying the code. Some example test measurements with the patched code (confirming no functional change in the congestion avoidance algorithm) can be seen at: http://www.hamilton.ie/doug/vegaspatch/ Signed-off-by: Doug Leith <doug.leith@nuim.ie> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-04tcp: tcp_vegas ssthresh bug fixDoug Leith
This patch fixes a bug in tcp_vegas.c. At the moment this code leaves ssthresh untouched. However, this means that the vegas congestion control algorithm is effectively unable to reduce cwnd below the ssthresh value (if the vegas update lowers the cwnd below ssthresh, then slow start is activated to raise it back up). One example where this matters is when during slow start cwnd overshoots the link capacity and a flow then exits slow start with ssthresh set to a value above where congestion avoidance would like to adjust it. Signed-off-by: Doug Leith <doug.leith@nuim.ie> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-03tcp: make urg+gso work for real this timeIlpo Järvinen
I should have noticed this earlier... :-) The previous solution to URG+GSO/TSO will cause SACK block tcp_fragment to do zig-zig patterns, or even worse, a steep downward slope into packet counting because each skb pcount would be truncated to pcount of 2 and then the following fragments of the later portion would restore the window again. Basically this reverts "tcp: Do not use TSO/GSO when there is urgent data" (33cf71cee1). It also removes some unnecessary code from tcp_current_mss that didn't work as intented either (could be that something was changed down the road, or it might have been broken since the dawn of time) because it only works once urg is already written while this bug shows up starting from ~64k before the urg point. The retransmissions already are split to mss sized chunks, so only new data sending paths need splitting in case they have a segment otherwise suitable for gso/tso. The actually check can be improved to be more narrow but since this is late -rc already, I'll postpone thinking the more fine-grained things. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-21tcp: Do not use TSO/GSO when there is urgent dataPetr Tesarik
This patch fixes http://bugzilla.kernel.org/show_bug.cgi?id=12014 Since most (if not all) implementations of TSO and even the in-kernel software GSO do not update the urgent pointer when splitting a large segment, it is necessary to turn off TSO/GSO for all outgoing traffic with the URG pointer set. Looking at tcp_current_mss (and the preceding comment) I even think this was the original intention. However, this approach is insufficient, because TSO/GSO is turned off only for newly created frames, not for frames which were already pending at the arrival of a message with MSG_OOB set. These frames were created when TSO/GSO was enabled, so they may be large, and they will have the urgent pointer set in tcp_transmit_skb(). With this patch, such large packets will be fragmented again before going to the transmit routine. As a side note, at least the following NICs are known to screw up the urgent pointer in the TCP header when doing TSO: Intel 82566MM (PCI ID 8086:1049) Intel 82566DC (PCI ID 8086:104b) Intel 82541GI (PCI ID 8086:1076) Broadcom NetXtreme II BCM5708 (PCI ID 14e4:164c) Signed-off-by: Petr Tesarik <ptesarik@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-20TPROXY: supply a struct flowi->flags argument in inet_sk_rebuild_header()Balazs Scheidler
inet_sk_rebuild_header() does a new route lookup if the dst_entry associated with a socket becomes stale. However inet_sk_rebuild_header() didn't use struct flowi->flags, causing the route lookup to fail for foreign-bound IP_TRANSPARENT sockets, causing an error state to be set for the sockets in question. Signed-off-by: Balazs Scheidler <bazsi@balabit.hu> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-20TPROXY: fill struct flowi->flags in udp_sendmsg()Balazs Scheidler
udp_sendmsg() didn't fill struct flowi->flags, which means that the route lookup would fail for non-local IPs even if the IP_TRANSPARENT sockopt was set. This prevents sendto() to work properly for UDP sockets, whereas bind(foreign-ip) + connect() + send() worked fine. Signed-off-by: Balazs Scheidler <bazsi@balabit.hu> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-19net: fix ip_mr_init() error pathBenjamin Thery
Similarly to IPv6 ip6_mr_init() (fixed last week), the order of cleanup operations in the error/exit section of ip_mr_init() is completely inversed. It should be the other way around. Also a del_timer() is missing in the error path. I should have guessed last week that this same error existed in ipmr.c too, as ip6mr.c is largely inspired by ipmr.c. Signed-off-by: Benjamin Thery <benjamin.thery@bull.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-12net: shy netns_ok checkAlexey Dobriyan
Failure to pass netns_ok check is SILENT, except some MIB counter is incremented somewhere. And adding "netns_ok = 1" (after long head-scratching session) is usually the last step in making some protocol netns-ready... Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-12tcp_htcp: last_cong bug fixDoug Leith
This patch fixes a minor bug in tcp_htcp.c which has been highlighted by Lachlan Andrew and Lawrence Stewart. Currently, the time since the last congestion event, which is stored in variable last_cong, is reset whenever there is a state change into TCP_CA_Open. This includes transitions of the type TCP_CA_Open->TCP_CA_Disorder->TCP_CA_Open which are not associated with backoff of cwnd. The patch changes last_cong to be updated only on transitions into TCP_CA_Open that occur after experiencing the congestion-related states TCP_CA_Loss, TCP_CA_Recovery, TCP_CA_CWR. Signed-off-by: Doug Leith <doug.leith@nuim.ie> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-10net: fix /proc/net/snmp as memory corruptorEric Dumazet
icmpmsg_put() can happily corrupt kernel memory, using a static table and forgetting to reset an array index in a loop. Remove the static array since its not safe without proper locking. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-05tcp: Fix recvmsg MSG_PEEK influence of blocking behavior.David S. Miller
Vito Caputo noticed that tcp_recvmsg() returns immediately from partial reads when MSG_PEEK is used. In particular, this means that SO_RCVLOWAT is not respected. Simply remove the test. And this matches the behavior of several other systems, including BSD. Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-04xfrm: Have af-specific init_tempsel() initialize family field of temporary ↵Andreas Steffen
selector While adding MIGRATE support to strongSwan, Andreas Steffen noticed that the selectors provided in XFRM_MSG_ACQUIRE have their family field uninitialized (those in MIGRATE do have their family set). Looking at the code, this is because the af-specific init_tempsel() (called via afinfo->init_tempsel() in xfrm_init_tempsel()) do not set the value. Reported-by: Andreas Steffen <andreas.steffen@strongswan.org> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Arnaud Ebalard <arno@natisbad.org>
2008-11-01udp: multicast packets need to check namespaceEric Dumazet
Current UDP multicast delivery is not namespace aware. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Acked-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-29cipso: unsigned buf_len cannot be negativeroel kluin
unsigned buf_len cannot be negative Signed-off-by: Roel Kluin <roel.kluin@gmail.com> Signed-off-by: Paul Moore <paul.moore@hp.com>
2008-10-26syncookies: fix inclusion of tcp options in syn-ackFlorian Westphal
David Miller noticed that commit 33ad798c924b4a1afad3593f2796d465040aadd5 '(tcp: options clean up') did not move the req->cookie_ts check. This essentially disabled commit 4dfc2817025965a2fc78a18c50f540736a6b5c24 '[Syncookies]: Add support for TCP options via timestamps.'. This restores the original logic. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-23tcp: Restore ordering of TCP options for the sake of inter-operabilityIlpo Järvinen
This is not our bug! Sadly some devices cannot cope with the change of TCP option ordering which was a result of the recent rewrite of the option code (not that there was some particular reason steming from the rewrite for the reordering) though any ordering of TCP options is perfectly legal. Thus we restore the original ordering to allow interoperability with/through such broken devices and add some warning about this trap. Since the reordering just happened without any particular reason, this change shouldn't cost us anything. There are already couple of known failure reports (within close proximity of the last release), so the problem might be more wide-spread than a single device. And other reports which may be due to the same problem though the symptoms were less obvious. Analysis of one of the case revealed (with very high probability) that sack capability cannot be negotiated as the first option (SYN never got a response). Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Reported-by: Aldo Maggi <sentiniate@tiscali.it> Tested-by: Aldo Maggi <sentiniate@tiscali.it> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-21tcp: should use number of sack blocks instead of -1Ilpo Järvinen
While looking for the recent "sack issue" I also read all eff_sacks usage that was played around by some relevant commit. I found out that there's another thing that is asking for a fix (unrelated to the "sack issue" though). This feature has probably very little significance in practice. Opposite direction timeout with bidirectional tcp comes to me as the most likely scenario though there might be other cases as well related to non-data segments we send (e.g., response to the opposite direction segment). Also some ACK losses or option space wasted for other purposes is necessary to prevent the earlier SACK feedback getting to the sender. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-20Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: netfilter: replace old NF_ARP calls with NFPROTO_ARP netfilter: fix compilation error with NAT=n netfilter: xt_recent: use proc_create_data() netfilter: snmp nat leaks memory in case of failure netfilter: xt_iprange: fix range inversion match netfilter: netns: use NFPROTO_NUMPROTO instead of NUMPROTO for tables array netfilter: ctnetlink: remove obsolete NAT dependency from Kconfig pkt_sched: sch_generic: Fix oops in sch_teql dccp: Port redirection support for DCCP tcp: Fix IPv6 fallout from 'Port redirection support for TCP' netdev: change name dropping error codes ipvs: Update CONFIG_IP_VS_IPV6 description and help text
2008-10-20netfilter: replace old NF_ARP calls with NFPROTO_ARPJan Engelhardt
(Supplements: ee999d8b9573df1b547aacdc6d79f86eb79c25cd) NFPROTO_ARP actually has a different value from NF_ARP, so ensure all callers use the new value so that packets _do_ get delivered to the registered hooks. Signed-off-by: Jan Engelhardt <jengelh@medozas.de> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-20netfilter: snmp nat leaks memory in case of failureIlpo Järvinen
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-17Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: net: Remove CONFIG_KMOD from net/ (towards removing CONFIG_KMOD entirely) ipv4: Add a missing rcu_assign_pointer() in routing cache. [netdrvr] ibmtr: PCMCIA IBMTR is ok on 64bit xen-netfront: Avoid unaligned accesses to IP header lmc: copy_*_user under spinlock [netdrvr] myri10ge, ixgbe: remove broken select INTEL_IOATDMA
2008-10-16net: Remove CONFIG_KMOD from net/ (towards removing CONFIG_KMOD entirely)Johannes Berg
Some code here depends on CONFIG_KMOD to not try to load protocol modules or similar, replace by CONFIG_MODULES where more than just request_module depends on CONFIG_KMOD and and also use try_then_request_module in ebtables. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-16ipv4: Add a missing rcu_assign_pointer() in routing cache.Eric Dumazet
rt_intern_hash() is doing an update of a RCU guarded hash chain without using rcu_assign_pointer() or equivalent barrier. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-16Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (26 commits) decnet: Fix compiler warning in dn_dev.c IPV6: Fix default gateway criteria wrt. HIGH/LOW preference radv option net/802/fc.c: Fix compilation warnings netns: correct mib stats in ip6_route_me_harder() netns: fix net_generic array leak rt2x00: fix regression introduced by "mac80211: free up 2 bytes in skb->cb" rtl8187: Add USB ID for Belkin F5D7050 with RTL8187B chip p54usb: Device ID updates mac80211: fixme for kernel-doc ath9k/mac80211: disallow fragmentation in ath9k, report to userspace libertas : Remove unused variable warning for "old_channel" from cmd.c mac80211: Fix scan RX processing oops orinoco: fix unsafe locking in spectrum_cs_suspend orinoco: fix unsafe locking in orinoco_cs_resume cfg80211: fix debugfs error handling mac80211: fix debugfs netdev rename iwlwifi: fix ct kill configuration for 5350 mac80211: fix HT information element parsing p54: Fix compilation problem on PPC mac80211: fix debugfs lockup ...
2008-10-16sysctl: simplify ->strategyAlexey Dobriyan
name and nlen parameters passed to ->strategy hook are unused, remove them. In general ->strategy hook should know what it's doing, and don't do something tricky for which, say, pointer to original userspace array may be needed (name). Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Acked-by: David S. Miller <davem@davemloft.net> [ networking bits ] Cc: Ralf Baechle <ralf@linux-mips.org> Cc: David Howells <dhowells@redhat.com> Cc: Matt Mackall <mpm@selenic.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-14netfilter: ctnetlink: remove bogus module dependency between ctnetlink and ↵Pablo Neira Ayuso
nf_nat This patch removes the module dependency between ctnetlink and nf_nat by means of an indirect call that is initialized when nf_nat is loaded. Now, nf_conntrack_netlink only requires nf_conntrack and nfnetlink. This patch puts nfnetlink_parse_nat_setup_hook into the nf_conntrack_core to avoid dependencies between ctnetlink, nf_conntrack_ipv4 and nf_conntrack_ipv6. This patch also introduces the function ctnetlink_change_nat that is only invoked from the creation path. Actually, the nat handling cannot be invoked from the update path since this is not allowed. By introducing this function, we remove the useless nat handling in the update path and we avoid deadlock-prone code. This patch also adds the required EAGAIN logic for nfnetlink. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-14netfilter: restore lost #ifdef guarding defrag exceptionPatrick McHardy
Nir Tzachar <nir.tzachar@gmail.com> reported a warning when sending fragments over loopback with NAT: [ 6658.338121] WARNING: at net/ipv4/netfilter/nf_nat_standalone.c:89 nf_nat_fn+0x33/0x155() The reason is that defragmentation is skipped for already tracked connections. This is wrong in combination with NAT and ip_conntrack actually had some ifdefs to avoid this behaviour when NAT is compiled in. The entire "optimization" may seem a bit silly, for now simply restoring the lost #ifdef is the easiest solution until we can come up with something better. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-14Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: qlge: Fix page size ifdef test. net: Rationalise email address: Network Specific Parts dsa: fix compile bug on s390 netns: mib6 section fixlet enic: Fix Kconfig headline description de2104x: wrong MAC address fix s390: claw compile fixlet net: export genphy_restart_aneg cxgb3: extend copyrights to 2008 cxgb3: update driver version net/phy: add missing kernel-doc pktgen: fix skb leak in case of failure mISDN/dsp_cmx.c: fix size checks misdn: use nonseekable_open() net: fix driver build errors due to missing net/ip6_checksum.h include
2008-10-13net: Rationalise email address: Network Specific PartsAlan Cox
Clean up the various different email addresses of mine listed in the code to a single current and valid address. As Dave says his network merges for 2.6.28 are now done this seems a good point to send them in where they won't risk disrupting real changes. Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-13Merge branch 'next' into for-linusJames Morris
2008-10-11gre: Initialise rtnl_link tunnel parameters properlyHerbert Xu
Brown paper bag error of calling memset with sizeof(p) instead of sizeof(*p). Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-10gre: minor cleanups in netlink interfacePatrick McHardy
- use typeful helpers for IFLA_GRE_LOCAL/IFLA_GRE_REMOTE - replace magic value by FIELD_SIZEOF - use MODULE_ALIAS_RTNL_LINK macro Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-10gre: fix copy and paste errorPatrick McHardy
The flags are dumped twice, the keys not at all. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-10cipso: Add support for native local labeling and fixup mapping namesPaul Moore
This patch accomplishes three minor tasks: add a new tag type for local labeling, rename the CIPSO_V4_MAP_STD define to CIPSO_V4_MAP_TRANS and replace some of the CIPSO "magic numbers" with constants from the header file. The first change allows CIPSO to support full LSM labels/contexts, not just MLS attributes. The second change brings the mapping names inline with what userspace is using, compatibility is preserved since we don't actually change the value. The last change is to aid readability and help prevent mistakes. Signed-off-by: Paul Moore <paul.moore@hp.com>
2008-10-10selinux: Set socket NetLabel based on connection endpointPaul Moore
Previous work enabled the use of address based NetLabel selectors, which while highly useful, brought the potential for additional per-packet overhead when used. This patch attempts to solve that by applying NetLabel socket labels when sockets are connect()'d. This should alleviate the per-packet NetLabel labeling for all connected sockets (yes, it even works for connected DGRAM sockets). Signed-off-by: Paul Moore <paul.moore@hp.com> Reviewed-by: James Morris <jmorris@namei.org>
2008-10-10netlabel: Add functionality to set the security attributes of a packetPaul Moore
This patch builds upon the new NetLabel address selector functionality by providing the NetLabel KAPI and CIPSO engine support needed to enable the new packet-based labeling. The only new addition to the NetLabel KAPI at this point is shown below: * int netlbl_skbuff_setattr(skb, family, secattr) ... and is designed to be called from a Netfilter hook after the packet's IP header has been populated such as in the FORWARD or LOCAL_OUT hooks. This patch also provides the necessary SELinux hooks to support this new functionality. Smack support is not currently included due to uncertainty regarding the permissions needed to expand the Smack network access controls. Signed-off-by: Paul Moore <paul.moore@hp.com> Reviewed-by: James Morris <jmorris@namei.org>
2008-10-10netlabel: Replace protocol/NetLabel linking with refrerence countsPaul Moore
NetLabel has always had a list of backpointers in the CIPSO DOI definition structure which pointed to the NetLabel LSM domain mapping structures which referenced the CIPSO DOI struct. The rationale for this was that when an administrator removed a CIPSO DOI from the system all of the associated NetLabel LSM domain mappings should be removed as well; a list of backpointers made this a simple operation. Unfortunately, while the backpointers did make the removal easier they were a bit of a mess from an implementation point of view which was making further development difficult. Since the removal of a CIPSO DOI is a realtively rare event it seems to make sense to remove this backpointer list as the optimization was hurting us more then it was helping. However, we still need to be able to track when a CIPSO DOI definition is being used so replace the backpointer list with a reference count. In order to preserve the current functionality of removing the associated LSM domain mappings when a CIPSO DOI is removed we walk the LSM domain mapping table, removing the relevant entries. Signed-off-by: Paul Moore <paul.moore@hp.com> Reviewed-by: James Morris <jmorris@namei.org>
2008-10-09udp: complete port availability checkingEric Dumazet
While looking at UDP port randomization, I noticed it was litle bit pessimistic, not looking at type of sockets (IPV6/IPV4) and not looking at bound addresses if any. We should perform same tests than when binding to a specific port. This permits a cleanup of udp_lib_get_port() Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-09tcpv[46]: fix md5 pseudoheader address field orderingIlpo Järvinen
Maybe it's just me but I guess those md5 people made a mess out of it by having *_md5_hash_* to use daddr, saddr order instead of the one that is natural (and equal to what csum functions use). For the segment were sending, the original addresses are reversed so buff's saddr == skb's daddr and vice-versa. Maybe I can finally proceed with unification of some code after fixing it first... :-) Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-09inet: Make tunnel RX/TX byte counters more consistentHerbert Xu
This patch makes the RX/TX byte counters for IPIP, GRE and SIT more consistent. Previously we included the external IP headers on the way out but not when the packet is inbound. The new scheme is to count payload only in both directions. For IPIP and SIT this simply means the exclusion of the external IP header. For GRE this means that we exclude the GRE header as well. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-09gre: Add Transparent Ethernet BridgingHerbert Xu
This patch adds support for Ethernet over GRE encapsulation. This is exposed to user-space with a new link type of "gretap" instead of "gre". It will create an ARPHRD_ETHER device in lieu of the usual ARPHRD_IPGRE. Note that to preserver backwards compatibility all Transparent Ethernet Bridging packets are passed to an ARPHRD_IPGRE tunnel if its key matches and there is no ARPHRD_ETHER device whose key matches more closely. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-09gre: Add netlink interfaceHerbert Xu
This patch adds a netlink interface that will eventually displace the existing ioctl interface. It utilises the elegant rtnl_link_ops mechanism. This also means that user-space no longer needs to rely on the tunnel interface being of type GRE to identify GRE tunnels. The identification can now occur using rtnl_link_ops. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-09gre: Move MTU setting out of ipgre_tunnel_bind_devHerbert Xu
This patch moves the dev->mtu setting out of ipgre_tunnel_bind_dev. This is in prepartion of using rtnl_link where we'll need to make the MTU setting conditional on whether the user has supplied an MTU. This also requires the move of the ipgre_tunnel_bind_dev call out of the dev->init function so that we can access the user parameters later. This patch also adds a check to prevent setting the MTU below the minimum of 68. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-09gre: Use needed_headroomHerbert Xu
Now that we have dev->needed_headroom, we can use it instead of having a bogus dev->hard_header_len. This also allows us to include dev->hard_header_len in the MTU computation so that when we do have a meaningful hard_harder_len in future it is included automatically in figuring out the MTU. Incidentally, this fixes a bug where we ignored the needed_headroom field of the underlying device in calculating our own hard_header_len. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-08Merge branch 'master' of ↵David S. Miller
master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/e1000e/ich8lan.c drivers/net/e1000e/netdev.c
2008-10-08ipvs: Remove stray file left over from ipvs moveSven Wegener
Commit cb7f6a7b716e801097b564dec3ccb58d330aef56 ("IPVS: Move IPVS to net/netfilter/ipvs") has left a stray file in the old location of ipvs. Signed-off-by: Sven Wegener <sven.wegener@stealer.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-08Merge branch 'lvs-next-2.6' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/horms/lvs-2.6 Conflicts: net/netfilter/Kconfig
2008-10-08inet: cleanup of local_port_rangeEric Dumazet
I noticed sysctl_local_port_range[] and its associated seqlock sysctl_local_port_range_lock were on separate cache lines. Moreover, sysctl_local_port_range[] was close to unrelated variables, highly modified, leading to cache misses. Moving these two variables in a structure can help data locality and moving this structure to read_mostly section helps sharing of this data among cpus. Cleanup of extern declarations (moved in include file where they belong), and use of inet_get_local_port_range() accessor instead of direct access to ports values. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-08udp: Improve port randomizationEric Dumazet
Current UDP port allocation is suboptimal. We select the shortest chain to chose a port (out of 512) that will hash in this shortest chain. First, it can lead to give not so ramdom ports and ease give attackers more opportunities to break the system. Second, it can consume a lot of CPU to scan all table in order to find the shortest chain. Third, in some pathological cases we can fail to find a free port even if they are plenty of them. This patch zap the search for a short chain and only use one random seed. Problem of getting long chains should be addressed in another way, since we can obtain long chains with non random ports. Based on a report and patch from Vitaly Mayatskikh Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>