aboutsummaryrefslogtreecommitdiff
path: root/net/core
AgeCommit message (Collapse)Author
2009-05-27net: net/core/sock.c cleanupEric Dumazet
Pure style cleanup patch. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-27net: ALIGN/PTR_ALIGN cleanup in alloc_netdev_mq()/netdev_priv()Eric Dumazet
Use ALIGN() and PTR_ALIGN() macros instead of handcoding them. Get rid of NETDEV_ALIGN_CONST ugly define Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-27gro: Store shinfo in local variable in skb_gro_receiveHerbert Xu
This patch stores the two shinfo pointers in local variables because they're used over and over again in skb_gro_receive. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-27gro: Nasty optimisations for page frags in skb_gro_receiveHerbert Xu
This patch reverses the direction of the frags array copy in skb_gro_receive in order simplify the loop conditional. It also avoids touching the first element of the original frags array. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-27gro: Open-code final pskb_may_pullHerbert Xu
As we know the only packets which need the final pskb_may_pull are completely non-linear, and have all the required bits in frag0, we can perform a straight memcpy instead of going through pskb_may_pull and doing skb_copy_bits. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-27gro: Avoid unnecessary comparison after skb_gro_headerHerbert Xu
For the overwhelming majority of cases, skb_gro_header's return value cannot be NULL. Yet we must check it because of its current form. This patch splits it up into multiple functions in order to avoid this. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-27gro: Optimise length comparison in skb_gro_headerHerbert Xu
By caching frag0_len, we can avoid checking both frag0 and the length separately in skb_gro_header. This helps as skb_gro_header is called four times per packet which amounts to a few million times at 10Gb/s. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-27gro: Only use skb_gro_header for completely non-linear packetsHerbert Xu
Currently skb_gro_header is used for packets which put the hardware header in skb->data with the rest in frags. Since the drivers that need this optimisation all provide completely non-linear packets, we can gain extra optimisations by only performing the frag0 optimisation for completely non-linear packets. In particular, we can simply test frag0 (instead of skb_headlen) to see whether the optimisation is in force. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-27gro: Localise offset/headlen in skb_gro_offsetHerbert Xu
This patch stores the offset/headlen in local variables as they're used repeatedly in skb_gro_offset. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-27gro: Inline skb_gro_header and cache frag0 virtual addressHerbert Xu
The function skb_gro_header is called four times per packet which quickly adds up at 10Gb/s. This patch inlines it to allow better optimisations. Some architectures perform multiplication for page_address, which is done by each skb_gro_header invocation. This patch caches that value in skb->cb to avoid the unnecessary multiplications. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-27gro: Open-code frags copy in skb_gro_receiveHerbert Xu
gcc does a poor job at generating code for the memcpy of the frags array in skb_gro_receive, which is the primary purpose of that function when merging frags. In particular, it can't utilise the alignment information of the source and destination. This patch open-codes the copy so we process words instead of bytes. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-26net: Remove bogus reference to BUS_ID_SIZE in sysfs code.David S. Miller
BUS_ID_SIZE is really no more, and device names are dynamically allocated and thus can be any necessary size. So remove the BUG check here making sure BUS_ID_SIZE is at least as large as IFNAMSIZ. Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-25net: txq_trans_update() helperEric Dumazet
We would like to get rid of netdev->trans_start = jiffies; that about all net drivers have to use in their start_xmit() function, and use txq->trans_start instead. This can be done generically in core network, as suggested by David. Some devices, (particularly loopback) dont need trans_start update, because they dont have transmit watchdog. We could add a new device flag, or rely on fact that txq->tran_start can be updated is txq->xmit_lock_owner is different than -1. Use a helper function to hide our choice. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-25pkt_sched: gen_estimator: Fix signed integers right-shifts.Jarek Poplawski
Right-shifts of signed integers are implementation-defined so unportable. With feedback from: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-25net: remove COMPAT_NET_DEV_OPSAlexander Beregalov
All drivers are already converted to new net_device_ops API and nobody uses old API anymore. Signed-off-by: Alexander Beregalov <a.beregalov@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-25Merge branch 'master' of ↵David S. Miller
master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/wireless/ath/ath5k/phy.c drivers/net/wireless/iwlwifi/iwl-agn.c drivers/net/wireless/iwlwifi/iwl3945-base.c
2009-05-25skbuff: Copy csum instead of csum_start/csum_offsetHerbert Xu
Hi: skbuff: Copy csum instead of csum_start/csum_offset It's easier to copy the u32 csum instead of its two u16 constituents. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Cheers, Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-25skbuff: Move new code into __copy_skb_headerHerbert Xu
Hi: skbuff: Move new __skb_clone code into __copy_skb_header It seems that people just keep on adding stuff to __skb_clone instead __copy_skb_header. This is wrong as it means your brand-new attributes won't always get copied as you intended. This patch moves them to the right place, and adds a comment to prevent this from happening again. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Thanks, Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-21net: Fix arg to trace_napi_poll() in netpoll.David S. Miller
Reproted by Stephen Rothwell. Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-21dropmon: add ability to detect when hardware dropsrxpacketsNeil Horman
Patch to add the ability to detect drops in hardware interfaces via dropwatch. Adds a tracepoint to net_rx_action to signal everytime a napi instance is polled. The dropmon code then periodically checks to see if the rx_frames counter has changed, and if so, adds a drop notification to the netlink protocol, using the reserved all-0's vector to indicate the drop location was in hardware, rather than somewhere in the code. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> include/linux/net_dropmon.h | 8 ++ include/trace/napi.h | 11 +++ net/core/dev.c | 5 + net/core/drop_monitor.c | 124 ++++++++++++++++++++++++++++++++++++++++++-- net/core/net-traces.c | 4 + net/core/netpoll.c | 2 6 files changed, 149 insertions(+), 5 deletions(-) Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-21netns: simplify net_ns_initStephen Hemminger
The net_ns_init code can be simplified. No need to save error code if it is only going to panic if it is set 4 lines later. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-21netns: remove leftover debugging messageStephen Hemminger
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-21pktgen: do not access flows[] beyond its lengthFlorian Westphal
typo -- pkt_dev->nflows is for stats only, the number of concurrent flows is stored in cflows. Reported-By: Vladimir Ivashchenko <hazard@francoudi.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-20net: Remove unused parameter from fill method in fib_rules_ops.Rami Rosen
The netlink message header (struct nlmsghdr) is an unused parameter in fill method of fib_rules_ops struct. This patch removes this parameter from this method and fixes the places where this method is called. (include/net/fib_rules.h) Signed-off-by: Rami Rosen <ramirose@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-18net: release dst entry in dev_hard_start_xmit()Eric Dumazet
One point of contention in high network loads is the dst_release() performed when a transmited skb is freed. This is because NIC tx completion calls dev_kree_skb() long after original call to dev_queue_xmit(skb). CPU cache is cold and the atomic op in dst_release() stalls. On SMP, this is quite visible if one CPU is 100% handling softirqs for a network device, since dst_clone() is done by other cpus, involving cache line ping pongs. It seems right place to release dst is in dev_hard_start_xmit(), for most devices but ones that are virtual, and some exceptions. David Miller suggested to define a new device flag, set in alloc_netdev_mq() (so that most devices set it at init time), and carefuly unset in devices which dont want a NULL skb->dst in their ndo_start_xmit(). List of devices that must clear this flag is : - loopback device, because it calls netif_rx() and quoting Patrick : "ip_route_input() doesn't accept loopback addresses, so loopback packets already need to have a dst_entry attached." - appletalk/ipddp.c : needs skb->dst in its xmit function - And all devices that call again dev_queue_xmit() from their xmit function (as some classifiers need skb->dst) : bonding, vlan, macvlan, eql, ifb, hdlc_fr Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-18net-sysfs: Use rtnl_trylock in sysfs methods.Eric W. Biederman
The earlier patch to fix the deadlock between a network device going away and writing to sysfs attributes was incomplete. - It did not set signal_pending so we would leak ERSTARTSYS to user space. - It used ERESTARTSYS which only restarts if sigaction configures it to. - It did not cover store and show for ifalias. So fix all of these up and use the new helper restart_syscall so we get the details correct on what it takes. Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-18net: fix skb_seq_read returning wrong offset/length for page frag dataThomas Chenault
When called with a consumed value that is less than skb_headlen(skb) bytes into a page frag, skb_seq_read() incorrectly returns an offset/length relative to skb->data. Ensure that data which should come from a page frag does. Signed-off-by: Thomas Chenault <thomas_chenault@dell.com> Tested-by: Shyam Iyer <shyam_iyer@dell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-18Merge branch 'master' of ↵David S. Miller
master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/scsi/fcoe/fcoe.c
2009-05-18pkt_sched: gen_estimator: use 64 bit intermediate counters for bpsEric Dumazet
gen_estimator can overflow bps (bytes per second) with Gb links, while it was designed with a u32 API, with a theorical limit of 34360Mbit (2^32 bytes) Using 64 bit intermediate avbps/brate counters can allow us to reach this theorical limit. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-18net: add tx_packets/tx_bytes/tx_dropped counters in struct netdev_queueEric Dumazet
offsetof(struct net_device, features)=0x44 offsetof(struct net_device, stats.tx_packets)=0x54 offsetof(struct net_device, stats.tx_bytes)=0x5c offsetof(struct net_device, stats.tx_dropped)=0x6c Network drivers that touch dev->stats.tx_packets/stats.tx_bytes in their tx path can slow down SMP operations, since they dirty a cache line that should stay shared (dev->features is needed in rx and tx paths) We could move away stats field in net_device but it wont help that much. (Two cache lines dirtied in tx path, we can do one only) Better solution is to add tx_packets/tx_bytes/tx_dropped in struct netdev_queue because this structure is already touched in tx path and counters updates will then be free (no increase in size) Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-17tcp: tcp_prequeue() can use keyed wakeupsJohn Dykstra
When TCP frees up write buffer space, avoid waking up tasks that have done a poll() or select() on the same socket specifying read-side events. This is an extension of a read-side patch by Eric Dumazet. Signed-off-by: John Dykstra <john.dykstra1@gmail.com> Acked-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-17netpoll: don't dereference NULL dev from npPavel Emelyanov
It looks like the dev in netpoll_poll can be NULL - at lease it's checked at the function beginning. Thus the dev->netde_ops dereference looks dangerous. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-17ipv4: remove an unused parameter from configure method of fib_rules_ops.Rami Rosen
Signed-off-by: Rami Rosen <ramirose@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-09net: check retval of dev_addr_init()Jiri Pirko
Add missed checking of dev_addr_init return value in alloc_netdev_mq. Signed-off-by: Jiri Pirko <jpirko@redhat.com> net/core/dev.c | 15 ++++++++++++--- 1 files changed, 12 insertions(+), 3 deletions(-) Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-08Network Drop Monitor: Fix skb_kill_datagramJohn Dykstra
Commit ead2ceb0ec9f85cff19c43b5cdb2f8a054484431 ("Network Drop Monitor: Adding kfree_skb_clean for non-drops and modifying end-of-line points for skbs") established new conventions for identifying dropped packets. Align skb_kill_datagram() with these conventions so that packets that get dropped just before the copy to userspace are properly tracked. Signed-off-by: John Dykstra <john.dykstra1@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-08Merge branch 'master' of ↵David S. Miller
master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: include/net/tcp.h
2009-05-06net: update skb_recycle_check() for hardware timestamping changesLennert Buytenhek
Commit ac45f602ee3d1b6f326f68bc0c2591ceebf05ba4 ("net: infrastructure for hardware time stamping") added two skb initialization actions to __alloc_skb(), which need to be added to skb_recycle_check() as well. Signed-off-by: Lennert Buytenhek <buytenh@wantstofly.org> Signed-off-by: Patrick Ohly <patrick.ohly@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-05net: introduce a list of device addresses dev_addr_list (v6)Jiri Pirko
v5 -> v6 (current): -removed so far unused static functions -corrected dev_addr_del_multiple to call del instead of add v4 -> v5: -added device address type (suggested by davem) -removed refcounting (better to have simplier code then safe potentially few bytes) v3 -> v4: -changed kzalloc to kmalloc in __hw_addr_add_ii() -ASSERT_RTNL() avoided in dev_addr_flush() and dev_addr_init() v2 -> v3: -removed unnecessary rcu read locking -moved dev_addr_flush() calling to ensure no null dereference of dev_addr v1 -> v2: -added forgotten ASSERT_RTNL to dev_addr_init and dev_addr_flush -removed unnecessary rcu_read locking in dev_addr_init -use compare_ether_addr_64bits instead of compare_ether_addr -use L1_CACHE_BYTES as size for allocating struct netdev_hw_addr -use call_rcu instead of rcu_synchronize -moved is_etherdev_addr into __KERNEL__ ifdef This patch introduces a new list in struct net_device and brings a set of functions to handle the work with device address list. The list is a replacement for the original dev_addr field and because in some situations there is need to carry several device addresses with the net device. To be backward compatible, dev_addr is made to point to the first member of the list so original drivers sees no difference. Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-04netns 2/2: extract net_create()Alexey Dobriyan
net_create() will be used by C/R to create fresh netns on restart. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-04netns 1/2: don't get/put old netns on CLONE_NEWNETAlexey Dobriyan
copy_net_ns() doesn't copy anything, it creates fresh netns, so get/put of old netns isn't needed. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-03net: Avoid modulus in skb_tx_hash() for forwarding case.David S. Miller
Based almost entirely upon a patch by Eric Dumazet. The common case is to have num-tx-queues <= num_rx_queues and even if num_tx_queues is larger it will not be significantly larger. Therefore, a subtraction loop is always going to be faster than modulus. Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-03Merge branch 'master' of ↵David S. Miller
master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
2009-05-01net: Fix skb_tx_hash() for forwarding workloads.Eric Dumazet
When skb_rx_queue_recorded() is true, we dont want to use jash distribution as the device driver exactly told us which queue was selected at RX time. jhash makes a statistical shuffle, but this wont work with 8 static inputs. Later improvements would be to compute reciprocal value of real_num_tx_queues to avoid a divide here. But this computation should be done once, when real_num_tx_queues is set. This needs a separate patch, and a new field in struct net_device. Reported-by: Andrew Dickinson <andrew@whydna.net> Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-04-30net: Fix oops when splicing skbs from a frag_list.Jarek Poplawski
Lennert Buytenhek wrote: > Since 4fb669948116d928ae44262ab7743732c574630d ("net: Optimize memory > usage when splicing from sockets.") I'm seeing this oops (e.g. in > 2.6.30-rc3) when splicing from a TCP socket to /dev/null on a driver > (mv643xx_eth) that uses LRO in the skb mode (lro_receive_skb) rather > than the frag mode: My patch incorrectly assumed skb->sk was always valid, but for "frag_listed" skbs we can only use skb->sk of their parent. Reported-by: Lennert Buytenhek <buytenh@wantstofly.org> Debugged-by: Lennert Buytenhek <buytenh@wantstofly.org> Tested-by: Lennert Buytenhek <buytenh@wantstofly.org> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-04-29Merge branch 'master' of ↵David S. Miller
master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: Documentation/isdn/00-INDEX drivers/net/wireless/iwlwifi/iwl-scan.c drivers/net/wireless/rndis_wlan.c net/mac80211/main.c
2009-04-28net: Avoid extra wakeups of threads blocked in wait_for_packet()Eric Dumazet
In 2.6.25 we added UDP mem accounting. This unfortunatly added a penalty when a frame is transmitted, since we have at TX completion time to call sock_wfree() to perform necessary memory accounting. This calls sock_def_write_space() and utimately scheduler if any thread is waiting on the socket. Thread(s) waiting for an incoming frame was scheduled, then had to sleep again as event was meaningless. (All threads waiting on a socket are using same sk_sleep anchor) This adds lot of extra wakeups and increases latencies, as noted by Christoph Lameter, and slows down softirq handler. Reference : http://marc.info/?l=linux-netdev&m=124060437012283&w=2 Fortunatly, Davide Libenzi recently added concept of keyed wakeups into kernel, and particularly for sockets (see commit 37e5540b3c9d838eb20f2ca8ea2eb8072271e403 epoll keyed wakeups: make sockets use keyed wakeups) Davide goal was to optimize epoll, but this new wakeup infrastructure can help non epoll users as well, if they care to setup an appropriate handler. This patch introduces new DEFINE_WAIT_FUNC() helper and uses it in wait_for_packet(), so that only relevant event can wakeup a thread blocked in this function. Trace of function calls from bnx2 TX completion bnx2_poll_work() is : __kfree_skb() skb_release_head_state() sock_wfree() sock_def_write_space() __wake_up_sync_key() __wake_up_common() receiver_wake_function() : Stops here since thread is waiting for an INPUT Reported-by: Christoph Lameter <cl@linux.com> Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-04-27gro: Fix handling of headers that extend over the tailHerbert Xu
The skb_gro_* code fails to handle the case where a header starts in the linear area but ends in the frags area. Since the goal of skb_gro_* is to optimise the case of completely non-linear packets, we can simply bail out if we have anything in the linear area. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-04-27drop_monitor: Update netlink protocol to include netlink attribute header in ↵Neil Horman
alert message When I initially implemented this protocol, I disregarded the use of netlink attribute headers, thinking for my purposes they weren't needed. I've come to find out that, as I'm starting to work with sending down messages with associated data (like config messages), the kernel code spits out warnings about trailing data in a netlink skb that doesn't have an associated header on it. As such, I'm going to start including attribute headers in my netlink transaction, and so for completeness, I should likely include them on messages bound from the kernel to user space. This patch adds that header to the kernel, and bumps the protocol version accordingly Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-04-21tun: fix tun_chr_aio_write so that aio worksMichael S. Tsirkin
aio_write gets const struct iovec * but tun_chr_aio_write casts this to struct iovec * and modifies the iovec. As a result, attempts to use io_submit to send packets to a tun device fail with weird errors such as EINVAL. Since tun is the only user of skb_copy_datagram_from_iovec, we can fix this simply by changing the later so that it does not touch the iovec passed to it. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-04-21net: skb_copy_datagram_const_iovec()Michael S. Tsirkin
There's an skb_copy_datagram_iovec() to copy out of a paged skb, but it modifies the iovec, and does not support starting at an offset in the destination. We want both in tun.c, so let's add the function. It's a carbon copy of skb_copy_datagram_iovec() with enough changes to be annoying. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>