aboutsummaryrefslogtreecommitdiff
path: root/net/ipv6/ip6_output.c
AgeCommit message (Collapse)Author
2009-06-11net: No more expensive sock_hold()/sock_put() on each txEric Dumazet
One of the problem with sock memory accounting is it uses a pair of sock_hold()/sock_put() for each transmitted packet. This slows down bidirectional flows because the receive path also needs to take a refcount on socket and might use a different cpu than transmit path or transmit completion path. So these two atomic operations also trigger cache line bounces. We can see this in tx or tx/rx workloads (media gateways for example), where sock_wfree() can be in top five functions in profiles. We use this sock_hold()/sock_put() so that sock freeing is delayed until all tx packets are completed. As we also update sk_wmem_alloc, we could offset sk_wmem_alloc by one unit at init time, until sk_free() is called. Once sk_free() is called, we atomic_dec_and_test(sk_wmem_alloc) to decrement initial offset and atomicaly check if any packets are in flight. skb_set_owner_w() doesnt call sock_hold() anymore sock_wfree() doesnt call sock_put() anymore, but check if sk_wmem_alloc reached 0 to perform the final freeing. Drawback is that a skb->truesize error could lead to unfreeable sockets, or even worse, prematurely calling __sk_free() on a live socket. Nice speedups on SMP. tbench for example, going from 2691 MB/s to 2711 MB/s on my 8 cpu dev machine, even if tbench was not really hitting sk_refcnt contention point. 5 % speedup on a UDP transmit workload (depends on number of flows), lowering TX completion cpu usage. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-09ipv6: Use frag list abstraction interfaces.David S. Miller
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-03net: skb->dst accessorsEric Dumazet
Define three accessors to get/set dst attached to a skb struct dst_entry *skb_dst(const struct sk_buff *skb) void skb_dst_set(struct sk_buff *skb, struct dst_entry *dst) void skb_dst_drop(struct sk_buff *skb) This one should replace occurrences of : dst_release(skb->dst) skb->dst = NULL; Delete skb->dst field Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-04-27snmp: add missing counters for RFC 4293Neil Horman
The IP MIB (RFC 4293) defines stats for InOctets, OutOctets, InMcastOctets and OutMcastOctets: http://tools.ietf.org/html/rfc4293 But it seems we don't track those in any way that easy to separate from other protocols. This patch adds those missing counters to the stats file. Tested successfully by me With help from Eric Dumazet. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-05ipv6: Copy cork options in ip6_append_dataHerbert Xu
As the options passed to ip6_append_data may be ephemeral, we need to duplicate it for corking. This patch applies the simplest fix which is to memdup all the relevant bits. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-10netns: ip6mr: allocate mroute6_socket per-namespace.Benjamin Thery
Preliminary work to make IPv6 multicast forwarding netns-aware. Make IPv6 multicast forwarding mroute6_socket per-namespace, moves it into struct netns_ipv6. At the moment, mroute6_socket is only referenced in init_net. Signed-off-by: Benjamin Thery <benjamin.thery@bull.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-28net: reduce structures when XFRM=nAlexey Dobriyan
ifdef out * struct sk_buff::sp (pointer) * struct dst_entry::xfrm (pointer) * struct sock::sk_policy (2 pointers) Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-08ipv6: added net argument to ICMP6MSGOUT_INC_STATS_BHDenis V. Lunev
Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-08ipv6: added net argument to ICMP6_INC_STATS_BHDenis V. Lunev
Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-08ipv6: added net argument to IP6_INC_STATS_BHDenis V. Lunev
Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-08netns: add net parameter to IP6_INC_STATSDenis V. Lunev
Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-08ipv6: local dev is actually unused in ip6_fragmentDenis V. Lunev
Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-09-09ipv6: Fix OOPS in ip6_dst_lookup_tail().Neil Horman
This fixes kernel bugzilla 11469: "TUN with 1024 neighbours: ip6_dst_lookup_tail NULL crash" dst->neighbour is not necessarily hooked up at this point in the processing path, so blindly dereferencing it is the wrong thing to do. This NULL check exists in other similar paths and this case was just an oversight. Also fix the completely wrong and confusing indentation here while we're at it. Based upon a patch by Evgeniy Polyakov. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-08-14netns: Add network namespace argument to rt6_fill_node() and ↵Brian Haley
ipv6_dev_get_saddr() ipv6_dev_get_saddr() blindly de-references dst_dev to get the network namespace, but some callers might pass NULL. Change callers to pass a namespace pointer instead. Signed-off-by: Brian Haley <brian.haley@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-08-03ipv6: Do not drop packet if skb->local_df is set to trueWei Yongjun
The old code will drop IPv6 packet if ipfragok is not set, since ipfragok is obsoleted, will be instead by used skb->local_df, so this check must be changed to skb->local_df. This patch fix this problem and not drop packet if skb->local_df is set to true. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-31ipv6: Fix ip6_xmit to send fragments if ipfragok is trueWei Yongjun
SCTP used ip6_xmit() to send fragments after received ICMP packet too big message. But while send packet used ip6_xmit, the skb->local_df is not initialized. So when skb if enter ip6_fragment(), the following code will discard the skb. ip6_fragment(...) { if (!skb->local_df) { ... return -EMSGSIZE; } ... } SCTP do the following step: 1. send packet ip6_xmit(skb, ipfragok=0) 2. received ICMP packet too big message 3. if PMTUD_ENABLE: ip6_xmit(skb, ipfragok=1) This patch fixed the problem by set local_df if ipfragok is true. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-25net: convert BUG_TRAP to generic WARN_ONIlpo Järvinen
Removes legacy reinvent-the-wheel type thing. The generic machinery integrates much better to automated debugging aids such as kerneloops.org (and others), and is unambiguous due to better naming. Non-intuively BUG_TRAP() is actually equal to WARN_ON() rather than BUG_ON() though some might actually be promoted to BUG_ON() but I left that to future. I could make at least one BUILD_BUG_ON conversion. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-19ipv6 netns: Make several "global" sysctl variables namespace aware.YOSHIFUJI Hideaki
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-03ipv6: Add disable_ipv6 sysctl to disable IPv6 operaion on specific interface.YOSHIFUJI Hideaki
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-07-03ipv6: Do not forward packets with the unspecified source address.YOSHIFUJI Hideaki
RFC4291 2.5.2. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-06-19net: Discard and warn about LRO'd skbs received for forwardingBen Hutchings
Add skb_warn_if_lro() to test whether an skb was received with LRO and warn if so. Change br_forward(), ip_forward() and ip6_forward() to call it) and discard the skb if it returns true. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-11net: remove CVS keywordsAdrian Bunk
This patch removes CVS keywords that weren't updated for a long time from comments. Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-05-12net: Allow netdevices to specify needed head/tailroomJohannes Berg
This patch adds needed_headroom/needed_tailroom members to struct net_device and updates many places that allocate sbks to use them. Not all of them can be converted though, and I'm sure I missed some (I mostly grepped for LL_RESERVED_SPACE) Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-12[IPV6]: Make address arguments const.YOSHIFUJI Hideaki
- net/ipv6/addrconf.c: ipv6_get_ifaddr(), ipv6_dev_get_saddr() - net/ipv6/mcast.c: ipv6_sock_mc_join(), ipv6_sock_mc_drop(), inet6_mc_check(), ipv6_dev_mc_inc(), __ipv6_dev_mc_dec(), ipv6_dev_mc_dec(), ipv6_chk_mcast_addr() - net/ipv6/route.c: rt6_lookup(), icmp6_dst_alloc() - net/ipv6/ip6_output.c: ip6_nd_hdr() - net/ipv6/ndisc.c: ndisc_send_ns(), ndisc_send_rs(), ndisc_send_redirect(), ndisc_get_neigh(), __ndisc_send() Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-04-05[IPV6] MROUTE: Support multicast forwarding.YOSHIFUJI Hideaki
Based on ancient patch by Mickael Hoerdt <hoerdt@clarinet.u-strasbg.fr>, which is available at <http://www-r2.u-strasbg.fr/~hoerdt/dev/linux_ipv6_mforwarding/patch-linux-ipv6-mforwarding-0.1a>. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-03-26[NET] NETNS: Omit sock->sk_net without CONFIG_NET_NS.YOSHIFUJI Hideaki
Introduce per-sock inlines: sock_net(), sock_net_set() and per-inet_timewait_sock inlines: twsk_net(), twsk_net_set(). Without CONFIG_NET_NS, no namespace other than &init_net exists. Let's explicitly define them to help compiler optimizations. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-03-26[NET] NETNS: Omit net_device->nd_net without CONFIG_NET_NS.YOSHIFUJI Hideaki
Introduce per-net_device inlines: dev_net(), dev_net_set(). Without CONFIG_NET_NS, no namespace other than &init_net exists. Let's explicitly define them to help compiler optimizations. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-03-25[IPV6]: Support Source Address Selection API (RFC5014).YOSHIFUJI Hideaki
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-03-25[IPV6]: Optimize hop-limit determination.YOSHIFUJI Hideaki
Last part of hop-limit determination is always: hoplimit = dst_metric(dst, RTAX_HOPLIMIT); if (hoplimit < 0) hoplimit = ipv6_get_hoplimit(dst->dev). Let's consolidate it as ip6_dst_hoplimit(dst). Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-03-25[IPV4,IPV6]: Share cork.rt between IPv4 and IPv6.YOSHIFUJI Hideaki
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-03-07[NETNS][IPV6] fix some missing namespaceDaniel Lezcano
This patch adds some missing namespace Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-05[NETNS][IPV6] route6 - pass always a valid socket to ip6_dst_lookupDaniel Lezcano
The ip6_dst_lookup receive a socket as parameter. In some part of the code it is called with a NULL socket parameter. We want to rely on the socket to retrieve the network namespace, so we always pass a valid socket in all cases. Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: Benjamin Thery <benjamin.thery@bull.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-05[NETNS][IPV6] route6 - add netns parameter to ip6_route_outputDaniel Lezcano
Add an netns parameter to ip6_route_output. That will allow to access to the right routing table for outgoing traffic. Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: Benjamin Thery <benjamin.thery@bull.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-04[IPV6] ADDRCONF: Convert ipv6_get_saddr() to ipv6_dev_get_saddr().YOSHIFUJI Hideaki
Since most users of ipv6_get_saddr() pass non-NULL as dst argument, use ipv6_dev_get_saddr() directly. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-02-28[IPV6]: Unexport ip6_find_1stfragoptAdrian Bunk
This patch removes the no longer used EXPORT_SYMBOL_GPL(ip6_find_1stfragopt). Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-14[IPV6]: Fix reversed local_df test in ip6_fragmentHerbert Xu
I managed to reverse the local_df test when forward-porting this patch so it actually makes things worse by never fragmenting at all. Thanks to David Stevens for testing and reporting this bug. Bill Fink pointed out that the local_df setting is also the wrong way around. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-12[IPV6]: Fix IPsec datagram fragmentationHerbert Xu
This is a long-standing bug in the IPsec IPv6 code that breaks when we emit a IPsec tunnel-mode datagram packet. The problem is that the code the emits the packet assumes the IPv6 stack will fragment it later, but the IPv6 stack assumes that whoever is emitting the packet is going to pre-fragment the packet. In the long term we need to fix both sides, e.g., to get the datagram code to pre-fragment as well as to get the IPv6 stack to fragment locally generated tunnel-mode packet. For now this patch does the second part which should make it work for the IPsec host case. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31[NET]: Introducing socket mark socket option.Laszlo Attila Toth
A userspace program may wish to set the mark for each packets its send without using the netfilter MARK target. Changing the mark can be used for mark based routing without netfilter or for packet filtering. It requires CAP_NET_ADMIN capability. Signed-off-by: Laszlo Attila Toth <panther@balabit.hu> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31[INET]: Prevent out-of-sync truesize on ip_fragment slow pathHerbert Xu
When ip_fragment has to hit the slow path the value of skb->truesize may go out of sync because we would have updated it without changing the packet length. This violates the constraints on truesize. This patch postpones the update of skb->truesize to prevent this. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[NETNS][IPV6]: inet6_addr - ipv6_get_ifaddr namespace awareDaniel Lezcano
The inet6_addr_lst is browsed taking into account the network namespace specified as parameter. If an address does not belong to the specified namespace, it is ignored. Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: Benjamin Thery <benjamin.thery@bull.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[NETNS]: Modify the neighbour table code so it handles multiple network ↵Eric W. Biederman
namespaces I'm actually surprised at how much was involved. At first glance it appears that the neighbour table data structures are already split by network device so all that should be needed is to modify the user interface commands to filter the set of neighbours by the network namespace of their devices. However a couple things turned up while I was reading through the code. The proxy neighbour table allows entries with no network device, and the neighbour parms are per network device (except for the defaults) so they now need a per network namespace default. So I updated the two structures (which surprised me) with their very own network namespace parameter. Updated the relevant lookup and destroy routines with a network namespace parameter and modified the code that interacts with users to filter out neighbour table entries for devices of other namespaces. I'm a little concerned that we can modify and display the global table configuration and from all network namespaces. But this appears good enough for now. I keep thinking modifying the neighbour table to have per network namespace instances of each table type would should be cleaner. The hash table is already dynamically sized so there are it is not a limiter. The default parameter would be straight forward to take care of. However when I look at the how the network table is built and used I still find some assumptions that there is only a single neighbour table for each type of table in the kernel. The netlink operations, neigh_seq_start, the non-core network users that call neigh_lookup. So while it might be doable it would require more refactoring than my current approach of just doing a little extra filtering in the code. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[XFRM] IPv6: Fix dst/routing check at transformation.Masahide NAKAMURA
IPv6 specific thing is wrongly removed from transformation at net-2.6.25. This patch recovers it with current design. o Update "path" of xfrm_dst since IPv6 transformation should care about routing changes. It is required by MIPv6 and off-link destined IPsec. o Rename nfheader_len which is for non-fragment transformation used by MIPv6 to rt6i_nfheader_len as IPv6 name space. Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[NETFILTER]: Introduce NF_INET_ hook valuesPatrick McHardy
The IPv4 and IPv6 hook values are identical, yet some code tries to figure out the "correct" value by looking at the address family. Introduce NF_INET_* values for both IPv4 and IPv6. The old values are kept in a #ifndef __KERNEL__ section for userspace compatibility. Signed-off-by: Patrick McHardy <kaber@trash.net> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[IPV6]: Add ip6_local_outHerbert Xu
Most callers of the LOCAL_OUT chain will set the IP packet length before doing so. They also share the same output function dst_output. This patch creates a new function called ip6_local_out which does all of that and converts the appropriate users over to it. Apart from removing duplicate code, it will also help in merging the IPsec output path. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[IPV6]: Move nfheader_len into rt6_infoHerbert Xu
The dst member nfheader_len is only used by IPv6. It's also currently creating a rather ugly alignment hole in struct dst. Therefore this patch moves it from there into struct rt6_info. It also reorders the fields in rt6_info to minimize holes. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[IPV6]: Only set nfheader_len for top xfrm dstHerbert Xu
We only need to set nfheader_len in the top xfrm dst. This is because we only ever read the nfheader_len from the top xfrm dst. It is also easier to count nfheader_len as part of header_len which then lets us remove the ugly wrapper functions for incrementing and decrementing header lengths in xfrm6_policy.c. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-23[INET]: Fix truesize setting in ip_append_dataHerbert Xu
As it is ip_append_data only counts page fragments to the skb that allocated it. As such it means that the first skb gets hit with a 4K charge even though it might have only used a fraction of it while all subsequent skb's that use the same page gets away with no charge at all. This bug was exposed by the UDP accounting patch. [ The wmem_alloc bumping needs to be moved with the truesize, noticed by Takahiro Yasui. -DaveM ] Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-12-07[IPv6] SNMP: Increment OutNoRoutes when connecting to unreachable networkMitsuru Chinen
IPv6 stack doesn't increment OutNoRoutes counter when IP datagrams is being discarded because no route could be found to transmit them to their destination. IPv6 stack should increment the counter. Incidentally, IPv4 stack increments that counter in such situation. Signed-off-by: Mitsuru Chinen <mitch@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-11-07[IPV6]: Consolidate the ip cork destruction in ip6_output.cPavel Emelyanov
The ip6_push_pending_frames and ip6_flush_pending_frames do the same things to flush the sock's cork. Move this into a separate function and save ~100 bytes from the .text Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-23[NET]: Treat the sign of the result of skb_headroom() consistentlyChuck Lever
In some places, the result of skb_headroom() is compared to an unsigned integer, and in others, the result is compared to a signed integer. Make the comparisons consistent and correct. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>