aboutsummaryrefslogtreecommitdiff
path: root/net/ipv6/ip6_output.c
AgeCommit message (Collapse)Author
2009-09-02ip: Report qdisc packet dropsEric Dumazet
Christoph Lameter pointed out that packet drops at qdisc level where not accounted in SNMP counters. Only if application sets IP_RECVERR, drops are reported to user (-ENOBUFS errors) and SNMP counters updated. IP_RECVERR is used to enable extended reliable error message passing, but these are not needed to update system wide SNMP stats. This patch changes things a bit to allow SNMP counters to be updated, regardless of IP_RECVERR being set or not on the socket. Example after an UDP tx flood # netstat -s ... IP: 1487048 outgoing packets dropped ... Udp: ... SndbufErrors: 1487048 send() syscalls, do however still return an OK status, to not break applications. Note : send() manual page explicitly says for -ENOBUFS error : "The output queue for a network interface was full. This generally indicates that the interface has stopped sending, but may be caused by transient congestion. (Normally, this does not occur in Linux. Packets are just silently dropped when a device queue overflows.) " This is not true for IP_RECVERR enabled sockets : a send() syscall that hit a qdisc drop returns an ENOBUFS error. Many thanks to Christoph, David, and last but not least, Alexey ! Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-01ipv6: ip6_push_pending_frames() should increment IPSTATS_MIB_OUTDISCARDSEric Dumazet
qdisc drops should be notified to IP_RECVERR enabled sockets, as done in IPV4. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-08-13inet6: Conversion from u8 to intGerrit Renker
This replaces assignments of the type "int on LHS" = "u8 on RHS" with simpler code. The LHS can express all of the unsigned right hand side values, hence the assigned value can not be negative. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-07-16Merge branch 'master' of ↵David S. Miller
master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/wireless/orinoco/main.c
2009-07-12udpv6: Remove unused skb argument of ipv6_select_ident()Sridhar Samudrala
- move ipv6_select_ident() inline function to ipv6.h and remove the unused skb argument Signed-off-by: Sridhar Samudrala <sri@us.ibm.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-07-12udpv6: Fix gso_size setting in ip6_ufo_append_dataSridhar Samudrala
- fix gso_size setting for ipv6 fragment to be a multiple of 8 bytes. Signed-off-by: Sridhar Samudrala <sri@us.ibm.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-07-11net: ip_push_pending_frames() fixEric Dumazet
After commit 2b85a34e911bf483c27cfdd124aeb1605145dc80 (net: No more expensive sock_hold()/sock_put() on each tx) we do not take any more references on sk->sk_refcnt on outgoing packets. I forgot to delete two __sock_put() from ip_push_pending_frames() and ip6_push_pending_frames(). Reported-by: Emil S Tantilov <emils.tantilov@gmail.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Tested-by: Emil S Tantilov <emils.tantilov@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-11net: No more expensive sock_hold()/sock_put() on each txEric Dumazet
One of the problem with sock memory accounting is it uses a pair of sock_hold()/sock_put() for each transmitted packet. This slows down bidirectional flows because the receive path also needs to take a refcount on socket and might use a different cpu than transmit path or transmit completion path. So these two atomic operations also trigger cache line bounces. We can see this in tx or tx/rx workloads (media gateways for example), where sock_wfree() can be in top five functions in profiles. We use this sock_hold()/sock_put() so that sock freeing is delayed until all tx packets are completed. As we also update sk_wmem_alloc, we could offset sk_wmem_alloc by one unit at init time, until sk_free() is called. Once sk_free() is called, we atomic_dec_and_test(sk_wmem_alloc) to decrement initial offset and atomicaly check if any packets are in flight. skb_set_owner_w() doesnt call sock_hold() anymore sock_wfree() doesnt call sock_put() anymore, but check if sk_wmem_alloc reached 0 to perform the final freeing. Drawback is that a skb->truesize error could lead to unfreeable sockets, or even worse, prematurely calling __sk_free() on a live socket. Nice speedups on SMP. tbench for example, going from 2691 MB/s to 2711 MB/s on my 8 cpu dev machine, even if tbench was not really hitting sk_refcnt contention point. 5 % speedup on a UDP transmit workload (depends on number of flows), lowering TX completion cpu usage. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-09ipv6: Use frag list abstraction interfaces.David S. Miller
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-03net: skb->dst accessorsEric Dumazet
Define three accessors to get/set dst attached to a skb struct dst_entry *skb_dst(const struct sk_buff *skb) void skb_dst_set(struct sk_buff *skb, struct dst_entry *dst) void skb_dst_drop(struct sk_buff *skb) This one should replace occurrences of : dst_release(skb->dst) skb->dst = NULL; Delete skb->dst field Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-04-27snmp: add missing counters for RFC 4293Neil Horman
The IP MIB (RFC 4293) defines stats for InOctets, OutOctets, InMcastOctets and OutMcastOctets: http://tools.ietf.org/html/rfc4293 But it seems we don't track those in any way that easy to separate from other protocols. This patch adds those missing counters to the stats file. Tested successfully by me With help from Eric Dumazet. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-05ipv6: Copy cork options in ip6_append_dataHerbert Xu
As the options passed to ip6_append_data may be ephemeral, we need to duplicate it for corking. This patch applies the simplest fix which is to memdup all the relevant bits. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-10netns: ip6mr: allocate mroute6_socket per-namespace.Benjamin Thery
Preliminary work to make IPv6 multicast forwarding netns-aware. Make IPv6 multicast forwarding mroute6_socket per-namespace, moves it into struct netns_ipv6. At the moment, mroute6_socket is only referenced in init_net. Signed-off-by: Benjamin Thery <benjamin.thery@bull.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-28net: reduce structures when XFRM=nAlexey Dobriyan
ifdef out * struct sk_buff::sp (pointer) * struct dst_entry::xfrm (pointer) * struct sock::sk_policy (2 pointers) Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-08ipv6: added net argument to ICMP6MSGOUT_INC_STATS_BHDenis V. Lunev
Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-08ipv6: added net argument to ICMP6_INC_STATS_BHDenis V. Lunev
Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-08ipv6: added net argument to IP6_INC_STATS_BHDenis V. Lunev
Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-08netns: add net parameter to IP6_INC_STATSDenis V. Lunev
Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-08ipv6: local dev is actually unused in ip6_fragmentDenis V. Lunev
Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-09-09ipv6: Fix OOPS in ip6_dst_lookup_tail().Neil Horman
This fixes kernel bugzilla 11469: "TUN with 1024 neighbours: ip6_dst_lookup_tail NULL crash" dst->neighbour is not necessarily hooked up at this point in the processing path, so blindly dereferencing it is the wrong thing to do. This NULL check exists in other similar paths and this case was just an oversight. Also fix the completely wrong and confusing indentation here while we're at it. Based upon a patch by Evgeniy Polyakov. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-08-14netns: Add network namespace argument to rt6_fill_node() and ↵Brian Haley
ipv6_dev_get_saddr() ipv6_dev_get_saddr() blindly de-references dst_dev to get the network namespace, but some callers might pass NULL. Change callers to pass a namespace pointer instead. Signed-off-by: Brian Haley <brian.haley@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-08-03ipv6: Do not drop packet if skb->local_df is set to trueWei Yongjun
The old code will drop IPv6 packet if ipfragok is not set, since ipfragok is obsoleted, will be instead by used skb->local_df, so this check must be changed to skb->local_df. This patch fix this problem and not drop packet if skb->local_df is set to true. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-31ipv6: Fix ip6_xmit to send fragments if ipfragok is trueWei Yongjun
SCTP used ip6_xmit() to send fragments after received ICMP packet too big message. But while send packet used ip6_xmit, the skb->local_df is not initialized. So when skb if enter ip6_fragment(), the following code will discard the skb. ip6_fragment(...) { if (!skb->local_df) { ... return -EMSGSIZE; } ... } SCTP do the following step: 1. send packet ip6_xmit(skb, ipfragok=0) 2. received ICMP packet too big message 3. if PMTUD_ENABLE: ip6_xmit(skb, ipfragok=1) This patch fixed the problem by set local_df if ipfragok is true. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-25net: convert BUG_TRAP to generic WARN_ONIlpo Järvinen
Removes legacy reinvent-the-wheel type thing. The generic machinery integrates much better to automated debugging aids such as kerneloops.org (and others), and is unambiguous due to better naming. Non-intuively BUG_TRAP() is actually equal to WARN_ON() rather than BUG_ON() though some might actually be promoted to BUG_ON() but I left that to future. I could make at least one BUILD_BUG_ON conversion. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-19ipv6 netns: Make several "global" sysctl variables namespace aware.YOSHIFUJI Hideaki
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-03ipv6: Add disable_ipv6 sysctl to disable IPv6 operaion on specific interface.YOSHIFUJI Hideaki
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-07-03ipv6: Do not forward packets with the unspecified source address.YOSHIFUJI Hideaki
RFC4291 2.5.2. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-06-19net: Discard and warn about LRO'd skbs received for forwardingBen Hutchings
Add skb_warn_if_lro() to test whether an skb was received with LRO and warn if so. Change br_forward(), ip_forward() and ip6_forward() to call it) and discard the skb if it returns true. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-11net: remove CVS keywordsAdrian Bunk
This patch removes CVS keywords that weren't updated for a long time from comments. Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-05-12net: Allow netdevices to specify needed head/tailroomJohannes Berg
This patch adds needed_headroom/needed_tailroom members to struct net_device and updates many places that allocate sbks to use them. Not all of them can be converted though, and I'm sure I missed some (I mostly grepped for LL_RESERVED_SPACE) Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-12[IPV6]: Make address arguments const.YOSHIFUJI Hideaki
- net/ipv6/addrconf.c: ipv6_get_ifaddr(), ipv6_dev_get_saddr() - net/ipv6/mcast.c: ipv6_sock_mc_join(), ipv6_sock_mc_drop(), inet6_mc_check(), ipv6_dev_mc_inc(), __ipv6_dev_mc_dec(), ipv6_dev_mc_dec(), ipv6_chk_mcast_addr() - net/ipv6/route.c: rt6_lookup(), icmp6_dst_alloc() - net/ipv6/ip6_output.c: ip6_nd_hdr() - net/ipv6/ndisc.c: ndisc_send_ns(), ndisc_send_rs(), ndisc_send_redirect(), ndisc_get_neigh(), __ndisc_send() Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-04-05[IPV6] MROUTE: Support multicast forwarding.YOSHIFUJI Hideaki
Based on ancient patch by Mickael Hoerdt <hoerdt@clarinet.u-strasbg.fr>, which is available at <http://www-r2.u-strasbg.fr/~hoerdt/dev/linux_ipv6_mforwarding/patch-linux-ipv6-mforwarding-0.1a>. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-03-26[NET] NETNS: Omit sock->sk_net without CONFIG_NET_NS.YOSHIFUJI Hideaki
Introduce per-sock inlines: sock_net(), sock_net_set() and per-inet_timewait_sock inlines: twsk_net(), twsk_net_set(). Without CONFIG_NET_NS, no namespace other than &init_net exists. Let's explicitly define them to help compiler optimizations. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-03-26[NET] NETNS: Omit net_device->nd_net without CONFIG_NET_NS.YOSHIFUJI Hideaki
Introduce per-net_device inlines: dev_net(), dev_net_set(). Without CONFIG_NET_NS, no namespace other than &init_net exists. Let's explicitly define them to help compiler optimizations. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-03-25[IPV6]: Support Source Address Selection API (RFC5014).YOSHIFUJI Hideaki
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-03-25[IPV6]: Optimize hop-limit determination.YOSHIFUJI Hideaki
Last part of hop-limit determination is always: hoplimit = dst_metric(dst, RTAX_HOPLIMIT); if (hoplimit < 0) hoplimit = ipv6_get_hoplimit(dst->dev). Let's consolidate it as ip6_dst_hoplimit(dst). Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-03-25[IPV4,IPV6]: Share cork.rt between IPv4 and IPv6.YOSHIFUJI Hideaki
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-03-07[NETNS][IPV6] fix some missing namespaceDaniel Lezcano
This patch adds some missing namespace Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-05[NETNS][IPV6] route6 - pass always a valid socket to ip6_dst_lookupDaniel Lezcano
The ip6_dst_lookup receive a socket as parameter. In some part of the code it is called with a NULL socket parameter. We want to rely on the socket to retrieve the network namespace, so we always pass a valid socket in all cases. Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: Benjamin Thery <benjamin.thery@bull.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-05[NETNS][IPV6] route6 - add netns parameter to ip6_route_outputDaniel Lezcano
Add an netns parameter to ip6_route_output. That will allow to access to the right routing table for outgoing traffic. Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: Benjamin Thery <benjamin.thery@bull.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-04[IPV6] ADDRCONF: Convert ipv6_get_saddr() to ipv6_dev_get_saddr().YOSHIFUJI Hideaki
Since most users of ipv6_get_saddr() pass non-NULL as dst argument, use ipv6_dev_get_saddr() directly. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-02-28[IPV6]: Unexport ip6_find_1stfragoptAdrian Bunk
This patch removes the no longer used EXPORT_SYMBOL_GPL(ip6_find_1stfragopt). Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-14[IPV6]: Fix reversed local_df test in ip6_fragmentHerbert Xu
I managed to reverse the local_df test when forward-porting this patch so it actually makes things worse by never fragmenting at all. Thanks to David Stevens for testing and reporting this bug. Bill Fink pointed out that the local_df setting is also the wrong way around. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-12[IPV6]: Fix IPsec datagram fragmentationHerbert Xu
This is a long-standing bug in the IPsec IPv6 code that breaks when we emit a IPsec tunnel-mode datagram packet. The problem is that the code the emits the packet assumes the IPv6 stack will fragment it later, but the IPv6 stack assumes that whoever is emitting the packet is going to pre-fragment the packet. In the long term we need to fix both sides, e.g., to get the datagram code to pre-fragment as well as to get the IPv6 stack to fragment locally generated tunnel-mode packet. For now this patch does the second part which should make it work for the IPsec host case. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31[NET]: Introducing socket mark socket option.Laszlo Attila Toth
A userspace program may wish to set the mark for each packets its send without using the netfilter MARK target. Changing the mark can be used for mark based routing without netfilter or for packet filtering. It requires CAP_NET_ADMIN capability. Signed-off-by: Laszlo Attila Toth <panther@balabit.hu> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31[INET]: Prevent out-of-sync truesize on ip_fragment slow pathHerbert Xu
When ip_fragment has to hit the slow path the value of skb->truesize may go out of sync because we would have updated it without changing the packet length. This violates the constraints on truesize. This patch postpones the update of skb->truesize to prevent this. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[NETNS][IPV6]: inet6_addr - ipv6_get_ifaddr namespace awareDaniel Lezcano
The inet6_addr_lst is browsed taking into account the network namespace specified as parameter. If an address does not belong to the specified namespace, it is ignored. Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: Benjamin Thery <benjamin.thery@bull.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[NETNS]: Modify the neighbour table code so it handles multiple network ↵Eric W. Biederman
namespaces I'm actually surprised at how much was involved. At first glance it appears that the neighbour table data structures are already split by network device so all that should be needed is to modify the user interface commands to filter the set of neighbours by the network namespace of their devices. However a couple things turned up while I was reading through the code. The proxy neighbour table allows entries with no network device, and the neighbour parms are per network device (except for the defaults) so they now need a per network namespace default. So I updated the two structures (which surprised me) with their very own network namespace parameter. Updated the relevant lookup and destroy routines with a network namespace parameter and modified the code that interacts with users to filter out neighbour table entries for devices of other namespaces. I'm a little concerned that we can modify and display the global table configuration and from all network namespaces. But this appears good enough for now. I keep thinking modifying the neighbour table to have per network namespace instances of each table type would should be cleaner. The hash table is already dynamically sized so there are it is not a limiter. The default parameter would be straight forward to take care of. However when I look at the how the network table is built and used I still find some assumptions that there is only a single neighbour table for each type of table in the kernel. The netlink operations, neigh_seq_start, the non-core network users that call neigh_lookup. So while it might be doable it would require more refactoring than my current approach of just doing a little extra filtering in the code. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[XFRM] IPv6: Fix dst/routing check at transformation.Masahide NAKAMURA
IPv6 specific thing is wrongly removed from transformation at net-2.6.25. This patch recovers it with current design. o Update "path" of xfrm_dst since IPv6 transformation should care about routing changes. It is required by MIPv6 and off-link destined IPsec. o Rename nfheader_len which is for non-fragment transformation used by MIPv6 to rt6i_nfheader_len as IPv6 name space. Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[NETFILTER]: Introduce NF_INET_ hook valuesPatrick McHardy
The IPv4 and IPv6 hook values are identical, yet some code tries to figure out the "correct" value by looking at the address family. Introduce NF_INET_* values for both IPv4 and IPv6. The old values are kept in a #ifndef __KERNEL__ section for userspace compatibility. Signed-off-by: Patrick McHardy <kaber@trash.net> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>