aboutsummaryrefslogtreecommitdiff
path: root/net/core
AgeCommit message (Collapse)Author
2005-11-05[NET]: Fix race condition in sk_stream_wait_connectHerbert Xu
When sk_stream_wait_connect detects a state transition to ESTABLISHED or CLOSE_WAIT prior to it going to sleep, it will return without calling finish_wait and decrementing sk_write_pending. This may result in crashes and other unintended behaviour. The fix is to always call finish_wait and update sk_write_pending since it is safe to do so even if the wait entry is no longer on the queue. This bug was tracked down with the help of Alex Sidorenko and the fix is also based on his suggestion. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-11-02[NET]: Fix zero-size datagram receptionHerbert Xu
The recent rewrite of skb_copy_datagram_iovec broke the reception of zero-size datagrams. This patch fixes it. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-10-28[IPv4/IPv6]: UFO Scatter-gather approachAnanda Raju
Attached is kernel patch for UDP Fragmentation Offload (UFO) feature. 1. This patch incorporate the review comments by Jeff Garzik. 2. Renamed USO as UFO (UDP Fragmentation Offload) 3. udp sendfile support with UFO This patches uses scatter-gather feature of skb to generate large UDP datagram. Below is a "how-to" on changes required in network device driver to use the UFO interface. UDP Fragmentation Offload (UFO) Interface: ------------------------------------------- UFO is a feature wherein the Linux kernel network stack will offload the IP fragmentation functionality of large UDP datagram to hardware. This will reduce the overhead of stack in fragmenting the large UDP datagram to MTU sized packets 1) Drivers indicate their capability of UFO using dev->features |= NETIF_F_UFO | NETIF_F_HW_CSUM | NETIF_F_SG NETIF_F_HW_CSUM is required for UFO over ipv6. 2) UFO packet will be submitted for transmission using driver xmit routine. UFO packet will have a non-zero value for "skb_shinfo(skb)->ufo_size" skb_shinfo(skb)->ufo_size will indicate the length of data part in each IP fragment going out of the adapter after IP fragmentation by hardware. skb->data will contain MAC/IP/UDP header and skb_shinfo(skb)->frags[] contains the data payload. The skb->ip_summed will be set to CHECKSUM_HW indicating that hardware has to do checksum calculation. Hardware should compute the UDP checksum of complete datagram and also ip header checksum of each fragmented IP packet. For IPV6 the UFO provides the fragment identification-id in skb_shinfo(skb)->ip6_frag_id. The adapter should use this ID for generating IPv6 fragments. Signed-off-by: Ananda Raju <ananda.raju@neterion.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (forwarded) Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-10-28Merge master.kernel.org:/pub/scm/linux/kernel/git/acme/net-2.6.15Linus Torvalds
2005-10-28[PATCH] gfp_t: net/*Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-26[PATCH] kill massive wireless-related log spamJeff Garzik
Although this message is having the intended effect of causing wireless driver maintainers to upgrade their code, I never should have merged this patch in its present form. Leading to tons of bug reports and unhappy users. Some wireless apps poll for statistics regularly, which leads to a printk() every single time they ask for stats. That's a little bit _too_ much of a reminder that the driver is using an old API. Change this to printing out the message once, per kernel boot. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-26[SK_BUFF] kernel-doc: fix skbuff warningsRandy Dunlap
Add kernel-doc to skbuff.h, skbuff.c to eliminate kernel-doc warnings. Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-10-26[PKTGEN]: proc interface revisionStephen Hemminger
The code to handle the /proc interface can be cleaned up in several places: * use seq_file for read * don't need to remember all the filenames separately * use for_online_cpu's * don't vmalloc a buffer for small command from user. Committer note: This patch clashed with John Hawkes's "[NET]: Wider use of for_each_*cpu()", so I fixed it up manually. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: Robert Olsson <robert.olsson@its.uu.se> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-10-26[PKTGEN]: Spelling and white spaceStephen Hemminger
Fix some cosmetic issues. Indentation, spelling errors, and some whitespace. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: Robert Olsson <robert.olsson@its.uu.se> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-10-26[PKTGEN]: Use kzallocStephen Hemminger
These are cleanup patches for pktgen that can go in 2.6.15 Can use kzalloc in a couple of places. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: Robert Olsson <robert.olsson@its.uu.se> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-10-26[PKTGEN]: Sleeping function called under lockStephen Hemminger
pktgen is calling kmalloc GFP_KERNEL and vmalloc with lock held. The simplest fix is to turn the lock into a semaphore, since the thread lock is only used for admin control from user context. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: Robert Olsson <robert.olsson@its.uu.se> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-10-25[NET]: Wider use of for_each_*cpu()John Hawkes
In 'net' change the explicit use of for-loops and NR_CPUS into the general for_each_cpu() or for_each_online_cpu() constructs, as appropriate. This widens the scope of potential future optimizations of the general constructs, as well as takes advantage of the existing optimizations of first_cpu() and next_cpu(), which is advantageous when the true CPU count is much smaller than NR_CPUS. Signed-off-by: John Hawkes <hawkes@sgi.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-10-23[NEIGH] Fix timer leak in neigh_changeaddrHerbert Xu
neigh_changeaddr attempts to delete neighbour timers without setting nud_state. This doesn't work because the timer may have already fired when we acquire the write lock in neigh_changeaddr. The result is that the timer may keep firing for quite a while until the entry reaches NEIGH_FAILED. It should be setting the nud_state straight away so that if the timer has already fired it can simply exit once we relinquish the lock. In fact, this whole function is simply duplicating the logic in neigh_ifdown which in turn is already doing the right thing when it comes to deleting timers and setting nud_state. So all we have to do is take that code out and put it into a common function and make both neigh_changeaddr and neigh_ifdown call it. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2005-10-23[NEIGH] Fix add_timer race in neigh_add_timerHerbert Xu
neigh_add_timer cannot use add_timer unconditionally. The reason is that by the time it has obtained the write lock someone else (e.g., neigh_update) could have already added a new timer. So it should only use mod_timer and deal with its return value accordingly. This bug would have led to rare neighbour cache entry leaks. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2005-10-23[NEIGH] Print stack trace in neigh_add_timerHerbert Xu
Stack traces are very helpful in determining the exact nature of a bug. So let's print a stack trace when the timer is added twice. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2005-10-22[SK_BUFF]: ipvs_property field must be copiedJulian Anastasov
IPVS used flag NFC_IPVS_PROPERTY in nfcache but as now nfcache was removed the new flag 'ipvs_property' still needs to be copied. This patch should be included in 2.6.14. Further comments from Harald Welte: Sorry, seems like the bug was introduced by me. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-10-08[PATCH] gfp flags annotations - part 1Al Viro
- added typedef unsigned int __nocast gfp_t; - replaced __nocast uses for gfp flags with gfp_t - it gives exactly the same warnings as far as sparse is concerned, doesn't change generated code (from gcc point of view we replaced unsigned int with typedef) and documents what's going on far better. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-03[IPV4]: Get rid of bogus __in_put_dev in pktgenHerbert Xu
This patch gets rid of a bogus __in_dev_put() in pktgen.c. This was spotted by Suzanne Wood. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-10-03[IPV4]: Replace __in_dev_get with __in_dev_get_rcu/rtnlHerbert Xu
The following patch renames __in_dev_get() to __in_dev_get_rtnl() and introduces __in_dev_get_rcu() to cover the second case. 1) RCU with refcnt should use in_dev_get(). 2) RCU without refcnt should use __in_dev_get_rcu(). 3) All others must hold RTNL and use __in_dev_get_rtnl(). There is one exception in net/ipv4/route.c which is in fact a pre-existing race condition. I've marked it as such so that we remember to fix it. This patch is based on suggestions and prior work by Suzanne Wood and Paul McKenney. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-10-03[NET]: Fix packet timestamping.Herbert Xu
I've found the problem in general. It affects any 64-bit architecture. The problem occurs when you change the system time. Suppose that when you boot your system clock is forward by a day. This gets recorded down in skb_tv_base. You then wind the clock back by a day. From that point onwards the offset will be negative which essentially overflows the 32-bit variables they're stored in. In fact, why don't we just store the real time stamp in those 32-bit variables? After all, we're not going to overflow for quite a while yet. When we do overflow, we'll need a better solution of course. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-29Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds
2005-09-29[PATCH] proc_mkdir() should be used to create procfs directoriesAl Viro
A bunch of create_proc_dir_entry() calls creating directories had crept in since the last sweep; converted to proc_mkdir(). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-27[NET]: Fix module reference counts for loadable protocol modulesFrank Filz
I have been experimenting with loadable protocol modules, and ran into several issues with module reference counting. The first issue was that __module_get failed at the BUG_ON check at the top of the routine (checking that my module reference count was not zero) when I created the first socket. When sk_alloc() is called, my module reference count was still 0. When I looked at why sctp didn't have this problem, I discovered that sctp creates a control socket during module init (when the module ref count is not 0), which keeps the reference count non-zero. This section has been updated to address the point Stephen raised about checking the return value of try_module_get(). The next problem arose when my socket init routine returned an error. This resulted in my module reference count being decremented below 0. My socket ops->release routine was also being called. The issue here is that sock_release() calls the ops->release routine and decrements the ref count if sock->ops is not NULL. Since the socket probably didn't get correctly initialized, this should not be done, so we will set sock->ops to NULL because we will not call try_module_get(). While searching for another bug, I also noticed that sys_accept() has a possibility of doing a module_put() when it did not do an __module_get so I re-ordered the call to security_socket_accept(). Signed-off-by: Frank Filz <ffilzlnx@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-27[NET]: Prefetch dev->qdisc_lock in dev_queue_xmit()Eric Dumazet
We know the lock is going to be taken. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-27[NET]: Use non-recursive algorithm in skb_copy_datagram_iovec()Daniel Phillips
Use iteration instead of recursion. Fraglists within fraglists should never occur, so we BUG check this. Signed-off-by: Daniel Phillips <phillips@istop.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-27[NEIGH]: Add debugging check when adding timers.David S. Miller
If we double-add a neighbour entry timer, which should be impossible but has been reported, dump the current state of the entry so that we can debug this. Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-26Merge master.kernel.org:/pub/scm/linux/kernel/git/acme/llc-2.6David S. Miller
2005-09-24[NET]: Protect neigh_stat_seq_fops by CONFIG_PROC_FSAmos Waterland
From: Amos Waterland <apw@us.ibm.com> If CONFIG_PROC_FS is not selected, the compiler emits this warning: net/core/neighbour.c:64: warning: `neigh_stat_seq_fops' defined but not used Which is correct, because neigh_stat_seq_fops is in fact only initialized and used by code that is protected by CONFIG_PROC_FS. So this patch fixes that up. Signed-off-by: Amos Waterland <apw@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-22[LLC]: Fix for Bugzilla ticket #5156Jochen Friedrich
Signed-off-by: Jochen Friedrich <jochen@scram.de> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-09-12[NET]: fix-up schedule_timeout() usageNishanth Aravamudan
Use schedule_timeout_{,un}interruptible() instead of set_current_state()/schedule_timeout() to reduce kernel size. Also use human-time conversion functions instead of hard-coded division to avoid rounding issues. Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-09[PATCH] more SPIN_LOCK_UNLOCKED -> DEFINE_SPINLOCK conversionsIngo Molnar
This converts the final 20 DEFINE_SPINLOCK holdouts. (another 580 places are already using DEFINE_SPINLOCK). Build tested on x86. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-09[PATCH] timer initialization cleanup: DEFINE_TIMERIngo Molnar
Clean up timer initialization by introducing DEFINE_TIMER a'la DEFINE_SPINLOCK. Build and boot-tested on x86. A similar patch has been been in the -RT tree for some time. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07Merge branch 'upstream' of ↵Linus Torvalds
master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6
2005-09-06[NET]: proto_unregister: fix sleeping while atomicPatrick McHardy
proto_unregister holds a lock while calling kmem_cache_destroy, which can sleep. Noticed by Daniele Orlandi <daniele@orlandi.com>. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-06[PATCH] WE-19 for kernel 2.6.13Jean Tourrilhes
Hi Jeff, This is version 19 of the Wireless Extensions. It was supposed to be the fallback of the WPA API changes, but people seem quite happy about it (especially Jouni), so the patch is rather small. The patch has been fully tested with 2.6.13 and various wireless drivers, and is in its final version. Would you mind pushing that into Linus's kernel so that the driver and the apps can take advantage ot it ? It includes : o iwstat improvement (explicit dBm). This is the result of long discussions with Dan Williams, the authors of NetworkManager. Thanks to him for all the fruitful feedback. o remove pointer from event stream. I was not totally sure if this pointer was 32-64 bits clean, so I'd rather remove it and be at peace with it. o remove linux header from wireless.h. This has long been requested by people writting user space apps, now it's done, and it was not even painful. o final deprecation of spy_offset. You did not like it, it's now gone for good. o Start deprecating dev->get_wireless_stats -> debloat netdev o Add "check" version of event macros for ieee802.11 stack. Jiri Benc doesn't like the current macros, we aim to please ;-) All those changes, except the last one, have been bit-roting on my web pages for a while... Patches for most kernel drivers will follow. Patches for the Orinoco and the HostAP drivers have been sent to their respective maintainers. Have fun... Jean Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
2005-09-06[NET]: Make sure l_linger is unsigned to avoid negative timeoutsEric Dumazet
One of my x86_64 (linux 2.6.13) server log is filled with : schedule_timeout: wrong timeout value ffffffffffffff06 from ffffffff802e63ca schedule_timeout: wrong timeout value ffffffffffffff06 from ffffffff802e63ca schedule_timeout: wrong timeout value ffffffffffffff06 from ffffffff802e63ca schedule_timeout: wrong timeout value ffffffffffffff06 from ffffffff802e63ca schedule_timeout: wrong timeout value ffffffffffffff06 from ffffffff802e63ca This is because some application does a struct linger li; li.l_onoff = 1; li.l_linger = -1; setsockopt(sock, SOL_SOCKET, SO_LINGER, &li, sizeof(li)); And unfortunatly l_linger is defined as a 'signed int' in include/linux/socket.h: struct linger { int l_onoff; /* Linger active */ int l_linger; /* How long to linger for */ }; I dont know if it's safe to change l_linger to 'unsigned int' in the include file (It might be defined as int in ABI specs) Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-06Merge branch 'upstream' of ↵Linus Torvalds
master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6
2005-09-05[NET]: 2.6.13 breaks libpcap (and tcpdump)Herbert Xu
Patrick McHardy says: Never mind, I got it, we never fall through to the second switch statement anymore. I think we could simply break when load_pointer returns NULL. The switch statement will fall through to the default case and return 0 for all cases but 0 > k >= SKF_AD_OFF. Here's a patch to do just that. I left BPF_MSH alone because it's really a hack to calculate the IP header length, which makes no sense when applied to the special data. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-05[NET]: Do not protect sysctl_optmem_max with CONFIG_SYSCTLDavid S. Miller
The ipv4 and ipv6 protocols need to access it unconditionally. SYSCTL=n build failure reported by Russell King. Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-05[PATCH] (7/7) __user annotations (ethtool)viro@ftp.linux.org.uk
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
2005-08-29[NET]: use __read_mostly on kmem_cache_t , DEFINE_SNMP_STAT pointersEric Dumazet
This patch puts mostly read only data in the right section (read_mostly), to help sharing of these data between CPUS without memory ping pongs. On one of my production machine, tcp_statistics was sitting in a heavily modified cache line, so *every* SNMP update had to force a reload. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-29[NET]: Add support for getting the permanent hardware address.Jon Wetzel
This patch adds a new field to net device to hold the permanent hardware address, and adds a new generic ethtool_op function to get that address. Signed-off-by: Jon Wetzel <jon_wetzel@dell.com> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-29[NET]: Implement SKB fast cloning.David S. Miller
Protocols that make extensive use of SKB cloning, for example TCP, eat at least 2 allocations per packet sent as a result. To cut the kmalloc() count in half, we implement a pre-allocation scheme wherein we allocate 2 sk_buff objects in advance, then use a simple reference count to free up the memory at the correct time. Based upon an initial patch by Thomas Graf and suggestions from Herbert Xu. Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-29[NET]: Fix sparse warningsArnaldo Carvalho de Melo
Of this type, mostly: CHECK net/ipv6/netfilter.c net/ipv6/netfilter.c:96:12: warning: symbol 'ipv6_netfilter_init' was not declared. Should it be static? net/ipv6/netfilter.c:101:6: warning: symbol 'ipv6_netfilter_fini' was not declared. Should it be static? Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-29[NETLINK]: Add "groups" argument to netlink_kernel_createPatrick McHardy
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-29[NETLINK]: Convert netlink users to use group numbers instead of bitmasksPatrick McHardy
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-29[NET]: Store skb->timestamp as offset to a base timestampPatrick McHardy
Reduces skb size by 8 bytes on 64-bit. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-29[NETFILTER]: split net/core/netfilter.c into net/netfilter/*.cHarald Welte
This patch doesn't introduce any code changes, but merely splits the core netfilter code into four separate files. It also moves it from it's old location in net/core/ to the recently-created net/netfilter/ directory. Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-29[ICSK]: Introduce reqsk_queue_prune from code in tcp_synack_timerArnaldo Carvalho de Melo
With this we're very close to getting all of the current TCP refactorings in my dccp-2.6 tree merged, next changeset will export some functions needed by the current DCCP code and then dccp-2.6.git will be born! Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-29[SOCK]: Introduce sk_cloneArnaldo Carvalho de Melo
Out of tcp_create_openreq_child, will be used in dccp_create_openreq_child, and is a nice sock function anyway. Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>