aboutsummaryrefslogtreecommitdiff
path: root/net/ipv6/route.c
AgeCommit message (Collapse)Author
2008-03-03[NETNS][IPV6] ip6_fib - fib6_clean_all handle several network namespacesDaniel Lezcano
The function fib6_clean_all takes the network namespace as parameter. That allows to flush the routes related to a specific network namespace. Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: Benjamin Thery <benjamin.thery@bull.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-03[NETNS][IPV6] ip6_fib - make it per network namespaceDaniel Lezcano
The fib table for ipv6 are moved to the network namespace structure. All references to them are made relatively to the network namespace. All external calls to the ip6_fib functions taking the network namespace parameter are made using the init_net variable, so the ip6_fib engine is ready for the namespaces but the callers not yet. Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: Benjamin Thery <benjamin.thery@bull.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-04[IPV6]: Make ndisc_dst_alloc() common for later use.YOSHIFUJI Hideaki
For later use, this patch is renaming ndisc_dst_alloc() (and related function/structures) to icmp6_dst_alloc() (and so on). This patch also removing unused function- pointer argument for it. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-03-04[IPV6] ADDRCONF: Convert ipv6_get_saddr() to ipv6_dev_get_saddr().YOSHIFUJI Hideaki
Since most users of ipv6_get_saddr() pass non-NULL as dst argument, use ipv6_dev_get_saddr() directly. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-03-04[IPV6] SYSCTL: complete initialization for sysctl table in subsystem code.YOSHIFUJI Hideaki
Move initialization bits for subsystem sysctl tables to appropriate functions. - route - icmp Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-02-26[IPV6]: Add missing initializations of the new nl_info.nl_net fieldBenjamin Thery
Add some more missing initializations of the new nl_info.nl_net field in IPv6 stack. This field will be used when network namespaces are fully supported. Signed-off-by: Benjamin Thery <benjamin.thery@bull.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-09[IPV6]: Replace using the magic constant "1024" with IP6_RT_PRIO_USER for ↵Rami Rosen
fc_metric. This patch replaces the explicit usage of the magic constant "1024" with IP6_RT_PRIO_USER in the IPV6 tree. Signed-off-by: Rami Rosen <ramirose@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31[IPV6]: Update MSS even if MTU is unchanged.Jim Paris
This is needed because in ndisc.c, we have: static void ndisc_router_discovery(struct sk_buff *skb) { // ... if (ndopts.nd_opts_mtu) { // ... if (rt) rt->u.dst.metrics[RTAX_MTU-1] = mtu; rt6_mtu_change(skb->dev, mtu); // ... } Since the mtu is set directly here, rt6_mtu_change_route thinks that it is unchanged, and so it fails to update the MSS accordingly. This patch lets rt6_mtu_change_route still update MSS if old_mtu == new_mtu. Signed-off-by: Jim Paris <jim@jtan.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31[NET]: should explicitely initialize atomic_t field in struct dst_opsEric Dumazet
All but one struct dst_ops static initializations miss explicit initialization of entries field. As this field is atomic_t, we should use ATOMIC_INIT(0), and not rely on atomic_t implementation. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31[NETNS]: Add missing initialization of nl_info.nl_net in rtm_to_fib6_config()Benjamin Thery
Add missing initialization of the new nl_info.nl_net field in rtm_to_fib6_config(). This will be needed the store network namespace associated to the fib6_config struct. Signed-off-by: Benjamin Thery <benjamin.thery@bull.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[NETNS][DST] dst: pass the dst_ops as parameter to the gc functionsDaniel Lezcano
The garbage collection function receive the dst_ops structure as parameter. This is useful for the next incoming patchset because it will need the dst_ops (there will be several instances) and the network namespace pointer (contained in the dst_ops). The protocols which do not take care of the namespaces will not be impacted by this change (expect for the function signature), they do just ignore the parameter. Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[IPV6] route: kill some bloatIlpo Järvinen
net/ipv6/route.c: ip6_pkt_prohibit_out | -130 ip6_pkt_discard | -261 ip6_pkt_discard_out | -130 ip6_pkt_prohibit | -261 4 functions changed, 782 bytes removed, diff: -782 net/ipv6/route.c: ip6_pkt_drop | +300 1 function changed, 300 bytes added, diff: +300 net/ipv6/route.o: 5 functions changed, 300 bytes added, 782 bytes removed, diff: -482 Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[NETNS]: Add netns to nl_info structure.Denis V. Lunev
nl_info is used to track the end-user destination of routing change notification. This is a natural object to hold a namespace on. Place it there and utilize the context in the appropriate places. Acked-by: Benjamin Thery <benjamin.thery@bull.net> Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[NETNS][IPV6]: Make sysctls route per namespace.Daniel Lezcano
All the sysctl concerning the routes are moved to the network namespace structure. A helper function is called to initialize the variables. Because the ipv6 protocol is not yet per namespace, the variables are accessed relatively from the network namespace. Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[NETNS][IPV6]: Make multiple instance of sysctl tables.Daniel Lezcano
Each network namespace wants its own set of sysctl value, eg. we should not be able from a namespace to set a sysctl value for another namespace , especially for the initial network namespace. This patch duplicates the sysctl table when we register a new network namespace for ipv6. The duplicated table are postfixed with the "template" word to notify the developper the table is cloned. Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[IPV6]: Always pass a valid nl_info to inet6_rt_notify.Denis V. Lunev
This makes the code in the inet6_rt_notify more straightforward and provides groud for namespace passing. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[IPV6]: route6 remove ifdef for fib_rulesDaniel Lezcano
The patch defines the usual static inline functions when the code is disabled for fib6_rules. That's allow to remove some ifdef in route.c file and make the code a little more clear. Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com> Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[IPV6]: remove ifdef in route6 for xfrm6Daniel Lezcano
The following patch create the usual static inline functions to disable the xfrm6_init and xfrm6_fini function when XFRM is off. That's allow to remove some ifdef and make the code a little more clear. Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com> Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[IPV6]: create route6 proc init-fini functionsDaniel Lezcano
Make the proc creation/destruction to be a separate function. That allows to remove the #ifdef CONFIG_PROC_FS in the init/fini function and make them more readable. Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com> Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[IPV6] route6/fib6: Don't panic a kmem_cache_create.Daniel Lezcano
If the kmem_cache_creation fails, the kernel will panic. It is acceptable if the system is booting, but if the ipv6 protocol is compiled as a module and it is loaded after the system has booted, do we want to panic instead of just failing to initialize the protocol ? The init function is now returning an error and this one is checked for protocol initialization. So the ipv6 protocol will safely fails. Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com> Acked-by: Benjamin Thery <benjamin.thery@bull.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[IPV6]: Make ip6_route_init to return an error code.Daniel Lezcano
The route initialization function does not return any value to notify if the initialization is successful or not. This patch checks all calls made for the initilization in order to return a value for the caller. Unfortunately, proc_net_fops_create will return a NULL pointer if CONFIG_PROC_FS is off, so we can not check the return code without an ifdef CONFIG_PROC_FS block in the ip6_route_init function. Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com> Acked-by: Benjamin Thery <benjamin.thery@bull.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[NET]: Multiple namespaces in the all dst_ifdown routines.Denis V. Lunev
Move dst entries to a namespace loopback to catch refcounting leaks. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[IPV6]: Add RFC4214 supportFred L. Templin
This patch includes support for the Intra-Site Automatic Tunnel Addressing Protocol (ISATAP) per RFC4214. It uses the SIT module, and is configured using extensions to the "iproute2" utility. The diffs are specific to the Linux 2.6.24-rc2 kernel distribution. This version includes the diff for ./include/linux/if.h which was missing in the v2.4 submission and is needed to make the patch compile. The patch has been installed, compiled and tested in a clean 2.6.24-rc2 kernel build area. Signed-off-by: Fred L. Templin <fred.l.templin@boeing.com> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[NET]: Make rtnetlink infrastructure network namespace aware (v3)Denis V. Lunev
After this patch none of the netlink callback support anything except the initial network namespace but the rtnetlink infrastructure now handles multiple network namespaces. Changes from v2: - IPv6 addrlabel processing Changes from v1: - no need for special rtnl_unlock handling - fixed IPv6 ndisc Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[NET]: Modify all rtnetlink methods to only work in the initial namespace (v2)Denis V. Lunev
Before I can enable rtnetlink to work in all network namespaces I need to be certain that something won't break. So this patch deliberately disables all of the rtnletlink methods in everything except the initial network namespace. After the methods have been audited this extra check can be disabled. Changes from v1: - added IPv6 addrlabel protection Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2008-01-28[IPSEC]: Merge most of the output pathHerbert Xu
As part of the work on asynchrnous cryptographic operations, we need to be able to resume from the spot where they occur. As such, it helps if we isolate them to one spot. This patch moves most of the remaining family-specific processing into the common output code. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[NET]: Eliminate duplicate copies of dst_discardHerbert Xu
We have a number of copies of dst_discard scattered around the place which all do the same thing, namely free a packet on the input or output paths. This patch deletes all of them except dst_discard and points all the users to it. The only non-trivial bit is decnet where it returns an error. However, conceptually this is identical to the blackhole functions used in IPv4 and IPv6 which do not return errors. So they should either all return errors or all return zero. For now I've stuck with the majority and picked zero as the return value. It doesn't really matter in practice since few if any driver would react differently depending on a zero return value or NET_RX_DROP. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-20[IPV6] ROUTE: Make sending algorithm more friendly with RFC 4861.YOSHIFUJI Hideaki
We omit (or delay) sending NSes for known-to-unreachable routers (in NUD_FAILED state) according to RFC 4191 (Default Router Preferences and More-Specific Routes). But this is not fully compatible with RFC 4861 (Neighbor Discovery Protocol for IPv6), which does not remember unreachability of neighbors. So, let's avoid mixing sending algorithm of RFC 4191 and that of RFC 4861, and make the algorithm more friendly with RFC 4861 if RFC 4191 is disabled. Issue was found by IPv6 Ready Logo Core Self_Test 1.5.0b2 (by TAHI Project), and has been tracked down by Mitsuru Chinen <mitch@linux.vnet.ibm.com>. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-11-10[NET]: Make helper to get dst entry and "use" itPavel Emelyanov
There are many places that get the dst entry, increase the __use counter and set the "lastuse" time stamp. Make a helper for this. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-11-07[IPV6]: Convert /proc/net/ipv6_route to seq_file interfaceAlexey Dobriyan
This removes last proc_net_create() user. Kudos to Benjamin Thery and Stephen Hemminger for comments on previous version. Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-18sysctl: ipv6 route flushing (kill binary path)Eric W. Biederman
We don't preoperly support the sysctl binary path for flushing the ipv6 routes. So remove support for a binary path. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Alexey Dobriyan <adobriyan@sw.ru> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-15[IPV6]: Consolidate the ip6_pol_route_(input|output) pairPavel Emelyanov
The difference in both functions is in the "id" passed to the rt6_select, so just pass it as an extra argument from two outer helpers. This is minus 60 lines of code and 360 bytes of .text Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10[NET]: Make the loopback device per network namespace.Eric W. Biederman
This patch makes loopback_dev per network namespace. Adding code to create a different loopback device for each network namespace and adding the code to free a loopback device when a network namespace exits. This patch modifies all users the loopback_dev so they access it as init_net.loopback_dev, keeping all of the code compiling and working. A later pass will be needed to update the users to use something other than the initial network namespace. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10[NET]: Dynamically allocate the loopback device, part 1.Daniel Lezcano
This patch replaces all occurences to the static variable loopback_dev to a pointer loopback_dev. That provides the mindless, trivial, uninteressting change part for the dynamic allocation for the loopback. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com> Acked-By: Kirill Korotaev <dev@sw.ru> Acked-by: Benjamin Thery <benjamin.thery@bull.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10[NETLINK]: Introduce nested and byteorder flag to netlink attributeThomas Graf
This change allows the generic attribute interface to be used within the netfilter subsystem where this flag was initially introduced. The byte-order flag is yet unused, it's intended use is to allow automatic byte order convertions for all atomic types. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10[NET]: Make the device list and device lookups per namespace.Eric W. Biederman
This patch makes most of the generic device layer network namespace safe. This patch makes dev_base_head a network namespace variable, and then it picks up a few associated variables. The functions: dev_getbyhwaddr dev_getfirsthwbytype dev_get_by_flags dev_get_by_name __dev_get_by_name dev_get_by_index __dev_get_by_index dev_ioctl dev_ethtool dev_load wireless_process_ioctl were modified to take a network namespace argument, and deal with it. vlan_ioctl_set and brioctl_set were modified so their hooks will receive a network namespace argument. So basically anthing in the core of the network stack that was affected to by the change of dev_base was modified to handle multiple network namespaces. The rest of the network stack was simply modified to explicitly use &init_net the initial network namespace. This can be fixed when those components of the network stack are modified to handle multiple network namespaces. For now the ifindex generator is left global. Fundametally ifindex numbers are per namespace, or else we will have corner case problems with migration when we get that far. At the same time there are assumptions in the network stack that the ifindex of a network device won't change. Making the ifindex number global seems a good compromise until the network stack can cope with ifindex changes when you change namespaces, and the like. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10[NET]: Make /proc/net per network namespaceEric W. Biederman
This patch makes /proc/net per network namespace. It modifies the global variables proc_net and proc_net_stat to be per network namespace. The proc_net file helpers are modified to take a network namespace argument, and all of their callers are fixed to pass &init_net for that argument. This ensures that all of the /proc/net files are only visible and usable in the initial network namespace until the code behind them has been updated to be handle multiple network namespaces. Making /proc/net per namespace is necessary as at least some files in /proc/net depend upon the set of network devices which is per network namespace, and even more files in /proc/net have contents that are relevant to a single network namespace. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-31[IPV6]: Don't update ADVMSS on routes where the MTU is not also updatedSimon Arlott
The ADVMSS value was incorrectly updated for ALL routes when the MTU is updated because it's outside the effect of the if statement's condition. Signed-off-by: Simon Arlott <simon@fire.lp0.eu> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-20mm: Remove slab destructors from kmem_cache_create().Paul Mundt
Slab destructors were no longer supported after Christoph's c59def9f222d44bb7e2f0a559f2906191a0862d7 change. They've been BUGs for both slab and slub, and slob never supported them either. This rips out support for the dtor pointer from kmem_cache_create() completely and fixes up every single callsite in the kernel (there were about 224, not including the slab allocator definitions themselves, or the documentation references). Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2007-06-07[NETLINK]: Mark netlink policies constPatrick McHardy
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-24[XFRM]: Allow packet drops during larval state resolution.David S. Miller
The current IPSEC rule resolution behavior we have does not work for a lot of people, even though technically it's an improvement from the -EAGAIN buisness we had before. Right now we'll block until the key manager resolves the route. That works for simple cases, but many folks would rather packets get silently dropped until the key manager resolves the IPSEC rules. We can't tell these folks to "set the socket non-blocking" because they don't have control over the non-block setting of things like the sockets used to resolve DNS deep inside of the resolver libraries in libc. With that in mind I coded up the patch below with some help from Herbert Xu which provides packet-drop behavior during larval state resolution, controllable via sysctl and off by default. This lays the framework to either: 1) Make this default at some point or... 2) Move this logic into xfrm{4,6}_policy.c and implement the ARP-like resolution queue we've all been dreaming of. The idea would be to queue packets to the policy, then once the larval state is resolved by the key manager we re-resolve the route and push the packets out. The packets would timeout if the rule didn't get resolved in a certain amount of time. Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25[IPV6] SNMP: Fix several warnings without procfs.YOSHIFUJI Hideaki
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2007-04-25[NET]: cleanup extra semicolonsStephen Hemminger
Spring cleaning time... There seems to be a lot of places in the network code that have extra bogus semicolons after conditionals. Most commonly is a bogus semicolon after: switch() { } Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25[IPv6]: Use rtnl registration interfaceThomas Graf
Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25[SK_BUFF]: Introduce ipv6_hdr(), remove skb->nh.ipv6hArnaldo Carvalho de Melo
Now the skb->nh union has just one member, .raw, i.e. it is just like the skb->mac union, strange, no? I'm just leaving it like that till the transport layer is done with, when we'll rename skb->mac.raw to skb->mac_header (or ->mac_header_offset?), ditto for ->{h,nh}. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25[SK_BUFF]: Introduce skb_reset_mac_header(skb)Arnaldo Carvalho de Melo
For the common, open coded 'skb->mac.raw = skb->data' operation, so that we can later turn skb->mac.raw into a offset, reducing the size of struct sk_buff in 64bit land while possibly keeping it as a pointer on 32bit. This one touches just the most simple case, next will handle the slightly more "complex" cases. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25[IPV6]: Decentralize EXPORT_SYMBOLs.YOSHIFUJI Hideaki
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2007-04-13[IPV6] SNMP: Fix {In,Out}NoRoutes statistics.YOSHIFUJI Hideaki
A packet which is being discarded because of no routes in the forwarding path should not be counted as OutNoRoutes but as InNoRoutes. Additionally, on this occasion, a packet whose destinaion is not valid should be counted as InAddrErrors separately. Based on patch from Mitsuru Chinen <mitch@linux.vnet.ibm.com>. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-06[IPV6]: Revert recent change to rt6_check_dev().David S. Miller
This reverts a0d78ebf3a0e33a1aeacf2fc518ad9273d6a1c2f It causes pings to link-local addresses to fail. Signed-off-by: David S. Miller <davem@davemloft.net>
2007-03-25[IPV6]: Fix routing round-robin locking.David S. Miller
As per RFC2461, section 6.3.6, item #2, when no routers on the matching list are known to be reachable or probably reachable we do round robin on those available routes so that we make sure to probe as many of them as possible to detect when one becomes reachable faster. Each routing table has a rwlock protecting the tree and the linked list of routes at each leaf. The round robin code executes during lookup and thus with the rwlock taken as a reader. A small local spinlock tries to provide protection but this does not work at all for two reasons: 1) The round-robin list manipulation, as coded, goes like this (with read lock held): walk routes finding head and tail spin_lock(); rotate list using head and tail spin_unlock(); While one thread is rotating the list, another thread can end up with stale values of head and tail and then proceed to corrupt the list when it gets the lock. This ends up causing the OOPS in fib6_add() later onthat many people have been hitting. 2) All the other code paths that run with the rwlock held as a reader do not expect the list to change on them, they expect it to remain completely fixed while they hold the lock in that way. So, simply stated, it is impossible to implement this correctly using a manipulation of the list without violating the rwlock locking semantics. Reimplement using a per-fib6_node round-robin pointer. This way we don't need to manipulate the list at all, and since the round-robin pointer can only ever point to real existing entries we don't need to perform any locking on the changing of the round-robin pointer itself. We only need to reset the round-robin pointer to NULL when the entry it is pointing to is removed. The idea is from Thomas Graf and it is very similar to how this was implemented before the advanced router selection code when in. Signed-off-by: David S. Miller <davem@davemloft.net>