aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorHerbert Xu <herbert@gondor.apana.org.au>2009-02-04 16:55:27 -0800
committerDavid S. Miller <davem@davemloft.net>2009-02-04 16:55:27 -0800
commit9a279bcbe347496799711155ed41a89bc40f79c5 (patch)
treedc966d55c732772373ceaaec1c7942fca0ee99e4
parent7870389478d3c682c79c07abe7f1fce8b8a81952 (diff)
net: Partially allow skb destructors to be used on receive path
As it currently stands, skb destructors are forbidden on the receive path because the protocol end-points will overwrite any existing destructor with their own. This is the reason why we have to call skb_orphan in the loopback driver before we reinject the packet back into the stack, thus creating a period during which loopback traffic isn't charged to any socket. With virtualisation, we have a similar problem in that traffic is reinjected into the stack without being associated with any socket entity, thus providing no natural congestion push-back for those poor folks still stuck with UDP. Now had we been consistent in telling them that UDP simply has no congestion feedback, I could just fob them off. Unfortunately, we appear to have gone to some length in catering for this on the standard UDP path, with skb/socket accounting so that has created a very unhealthy dependency. Alas habits are difficult to break out of, so we may just have to allow skb destructors on the receive path. It turns out that making skb destructors useable on the receive path isn't as easy as it seems. For instance, simply adding skb_orphan to skb_set_owner_r isn't enough. This is because we assume all over the IP stack that skb->sk is an IP socket if present. The new transparent proxy code goes one step further and assumes that skb->sk is the receiving socket if present. Now all of this can be dealt with by adding simple checks such as only treating skb->sk as an IP socket if skb->sk->sk_family matches. However, it turns out that for bridging at least we don't need to do all of this work. This is of interest because most virtualisation setups use bridging so we don't actually go through the IP stack on the host (with the exception of our old nemesis the bridge netfilter, but that's easily taken care of). So this patch simply adds skb_orphan to the point just before we enter the IP stack, but after we've gone through the bridge on the receive path. It also adds an skb_orphan to the one place in netfilter that touches skb->sk/skb->destructor, that is, tproxy. One word of caution, because of the internal code structure, anyone wishing to deploy this must use skb_set_owner_w as opposed to skb_set_owner_r since many functions that create a new skb from an existing one will invoke skb_set_owner_w on the new skb. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
-rw-r--r--net/core/dev.c2
-rw-r--r--net/netfilter/nf_tproxy_core.c1
2 files changed, 3 insertions, 0 deletions
diff --git a/net/core/dev.c b/net/core/dev.c
index 220f52a1001..3337cf98f23 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2288,6 +2288,8 @@ ncls:
if (!skb)
goto out;
+ skb_orphan(skb);
+
type = skb->protocol;
list_for_each_entry_rcu(ptype,
&ptype_base[ntohs(type) & PTYPE_HASH_MASK], list) {
diff --git a/net/netfilter/nf_tproxy_core.c b/net/netfilter/nf_tproxy_core.c
index cdc97f3105a..5490fc37c92 100644
--- a/net/netfilter/nf_tproxy_core.c
+++ b/net/netfilter/nf_tproxy_core.c
@@ -71,6 +71,7 @@ int
nf_tproxy_assign_sock(struct sk_buff *skb, struct sock *sk)
{
if (inet_sk(sk)->transparent) {
+ skb_orphan(skb);
skb->sk = sk;
skb->destructor = nf_tproxy_destructor;
return 1;