aboutsummaryrefslogtreecommitdiff
path: root/net/core/dev.c
AgeCommit message (Collapse)Author
2009-09-30net: restore tx timestamping for accelerated vlansEric Dumazet
Since commit 9b22ea560957de1484e6b3e8538f7eef202e3596 ( net: fix packet socket delivery in rx irq handler ) We lost rx timestamping of packets received on accelerated vlans. Effect is that tcpdump on real dev can show strange timings, since it gets rx timestamps too late (ie at skb dequeueing time, not at skb queueing time) 14:47:26.986871 IP 192.168.20.110 > 192.168.20.141: icmp 64: echo request seq 1 14:47:26.986786 IP 192.168.20.141 > 192.168.20.110: icmp 64: echo reply seq 1 14:47:27.986888 IP 192.168.20.110 > 192.168.20.141: icmp 64: echo request seq 2 14:47:27.986781 IP 192.168.20.141 > 192.168.20.110: icmp 64: echo reply seq 2 14:47:28.986896 IP 192.168.20.110 > 192.168.20.141: icmp 64: echo request seq 3 14:47:28.986780 IP 192.168.20.141 > 192.168.20.110: icmp 64: echo reply seq 3 Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-15bonding: remap muticast addresses without using dev_close() and dev_open()Moni Shoua
This patch fixes commit e36b9d16c6a6d0f59803b3ef04ff3c22c3844c10. The approach there is to call dev_close()/dev_open() whenever the device type is changed in order to remap the device IP multicast addresses to HW multicast addresses. This approach suffers from 2 drawbacks: *. It assumes tha the device is UP when calling dev_close(), or otherwise dev_close() has no affect. It is worth to mention that initscripts (Redhat) and sysconfig (Suse) doesn't act the same in this matter. *. dev_close() has other side affects, like deleting entries from the routing table, which might be unnecessary. The fix here is to directly remap the IP multicast addresses to HW multicast addresses for a bonding device that changes its type, and nothing else. Reported-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: Moni Shoua <monis@voltaire.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-14Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1623 commits) netxen: update copyright netxen: fix tx timeout recovery netxen: fix file firmware leak netxen: improve pci memory access netxen: change firmware write size tg3: Fix return ring size breakage netxen: build fix for INET=n cdc-phonet: autoconfigure Phonet address Phonet: back-end for autoconfigured addresses Phonet: fix netlink address dump error handling ipv6: Add IFA_F_DADFAILED flag net: Add DEVTYPE support for Ethernet based devices mv643xx_eth.c: remove unused txq_set_wrr() ucc_geth: Fix hangs after switching from full to half duplex ucc_geth: Rearrange some code to avoid forward declarations phy/marvell: Make non-aneg speed/duplex forcing work for 88E1111 PHYs drivers/net/phy: introduce missing kfree drivers/net/wan: introduce missing kfree net: force bridge module(s) to be GPL Subject: [PATCH] appletalk: Fix skb leak when ipddp interface is not loaded ... Fixed up trivial conflicts: - arch/x86/include/asm/socket.h converted to <asm-generic/socket.h> in the x86 tree. The generic header has the same new #define's, so that works out fine. - drivers/net/tun.c fix conflict between 89f56d1e9 ("tun: reuse struct sock fields") that switched over to using 'tun->socket.sk' instead of the redundantly available (and thus removed) 'tun->sk', and 2b980dbd ("lsm: Add hooks to the TUN driver") which added a new 'tun->sk' use. Noted in 'next' by Stephen Rothwell.
2009-09-11net: force bridge module(s) to be GPLStephen Hemminger
The only valid usage for the bridge frame hooks are by a GPL components (such as the bridge module). The kernel should not leave a crack in the door for proprietary networking stacks to slip in. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-03net: Remove debugging codeEric Dumazet
Remove a debugging aid I accidently left in previous 'cleanup' patch Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-03net: net/core/dev.c cleanupsEric Dumazet
Pure style cleanup patch before surgery :) Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-08-30net: convert remaining non-symbolic return values in dev_queue_xmitKrishna Kumar
Patch compiled and 32 simultaneous netperf testing ran fine. Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-08-19Drop ARPHRD_IEEE802154_PHYDmitry Eremin-Solenikov
There are not maste devices in mac802154 anymore, so drop ARPHRD_IEEE802154_PHY definition. Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
2009-08-14Networking: use CAP_NET_ADMIN when deciding to call request_moduleEric Paris
The networking code checks CAP_SYS_MODULE before using request_module() to try to load a kernel module. While this seems reasonable it's actually weakening system security since we have to allow CAP_SYS_MODULE for things like /sbin/ip and bluetoothd which need to be able to trigger module loads. CAP_SYS_MODULE actually grants those binaries the ability to directly load any code into the kernel. We should instead be protecting modprobe and the modules on disk, rather than granting random programs the ability to load code directly into the kernel. Instead we are going to gate those networking checks on CAP_NET_ADMIN which still limits them to root but which does not grant those processes the ability to load arbitrary code into the kernel. Signed-off-by: Eric Paris <eparis@redhat.com> Acked-by: Serge Hallyn <serue@us.ibm.com> Acked-by: Paul Moore <paul.moore@hp.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: James Morris <jmorris@namei.org>
2009-08-12Merge branch 'master' of ↵David S. Miller
master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: arch/microblaze/include/asm/socket.h
2009-08-06net: Avoid enqueuing skb for default qdiscsKrishna Kumar
dev_queue_xmit enqueue's a skb and calls qdisc_run which dequeue's the skb and xmits it. In most cases, the skb that is enqueue'd is the same one that is dequeue'd (unless the queue gets stopped or multiple cpu's write to the same queue and ends in a race with qdisc_run). For default qdiscs, we can remove the redundant enqueue/dequeue and simply xmit the skb since the default qdisc is work-conserving. The patch uses a new flag - TCQ_F_CAN_BYPASS to identify the default fast queue. The controversial part of the patch is incrementing qlen when a skb is requeued - this is to avoid checks like the second line below: + } else if ((q->flags & TCQ_F_CAN_BYPASS) && !qdisc_qlen(q) && >> !q->gso_skb && + !test_and_set_bit(__QDISC_STATE_RUNNING, &q->state)) { Results of a 2 hour testing for multiple netperf sessions (1, 2, 4, 8, 12 sessions on a 4 cpu system-X). The BW numbers are aggregate Mb/s across iterations tested with this version on System-X boxes with Chelsio 10gbps cards: ---------------------------------- Size | ORG BW NEW BW | ---------------------------------- 128K | 156964 159381 | 256K | 158650 162042 | ---------------------------------- Changes from ver1: 1. Move sch_direct_xmit declaration from sch_generic.h to pkt_sched.h 2. Update qdisc basic statistics for direct xmit path. 3. Set qlen to zero in qdisc_reset. 4. Changed some function names to more meaningful ones. Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-08-05net: mark read-only arrays as constJan Engelhardt
String literals are constant, and usually, we can also tag the array of pointers const too, moving it to the .rodata section. Signed-off-by: Jan Engelhardt <jengelh@medozas.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-08-05net: Fix spinlock use in alloc_netdev_mq()Ingo Molnar
-tip testing found this lockdep warning: [ 2.272010] calling net_dev_init+0x0/0x164 @ 1 [ 2.276033] device class 'net': registering [ 2.280191] INFO: trying to register non-static key. [ 2.284005] the code is fine but needs lockdep annotation. [ 2.284005] turning off the locking correctness validator. [ 2.284005] Pid: 1, comm: swapper Not tainted 2.6.31-rc5-tip #1145 [ 2.284005] Call Trace: [ 2.284005] [<7958eb4e>] ? printk+0xf/0x11 [ 2.284005] [<7904f83c>] __lock_acquire+0x11b/0x622 [ 2.284005] [<7908c9b7>] ? alloc_debug_processing+0xf9/0x144 [ 2.284005] [<7904e2be>] ? mark_held_locks+0x3a/0x52 [ 2.284005] [<7908dbc4>] ? kmem_cache_alloc+0xa8/0x13f [ 2.284005] [<7904e475>] ? trace_hardirqs_on_caller+0xa2/0xc3 [ 2.284005] [<7904fdf6>] lock_acquire+0xb3/0xd0 [ 2.284005] [<79489678>] ? alloc_netdev_mq+0xf5/0x1ad [ 2.284005] [<79591514>] _spin_lock_bh+0x2d/0x5d [ 2.284005] [<79489678>] ? alloc_netdev_mq+0xf5/0x1ad [ 2.284005] [<79489678>] alloc_netdev_mq+0xf5/0x1ad [ 2.284005] [<793a38f2>] ? loopback_setup+0x0/0x74 [ 2.284005] [<798eecd0>] loopback_net_init+0x20/0x5d [ 2.284005] [<79483efb>] register_pernet_device+0x23/0x4b [ 2.284005] [<798f5c9f>] net_dev_init+0x115/0x164 [ 2.284005] [<7900104f>] do_one_initcall+0x4a/0x11a [ 2.284005] [<798f5b8a>] ? net_dev_init+0x0/0x164 [ 2.284005] [<79066f6d>] ? register_irq_proc+0x8c/0xa8 [ 2.284005] [<798cc29a>] do_basic_setup+0x42/0x52 [ 2.284005] [<798cc30a>] kernel_init+0x60/0xa1 [ 2.284005] [<798cc2aa>] ? kernel_init+0x0/0xa1 [ 2.284005] [<79003e03>] kernel_thread_helper+0x7/0x10 [ 2.284078] device: 'lo': device_add [ 2.288248] initcall net_dev_init+0x0/0x164 returned 0 after 11718 usecs [ 2.292010] calling neigh_init+0x0/0x66 @ 1 [ 2.296010] initcall neigh_init+0x0/0x66 returned 0 after 0 usecs it's using an zero-initialized spinlock. This is a side-effect of: dev_unicast_init(dev); in alloc_netdev_mq() making use of dev->addr_list_lock. The device has just been allocated freshly, it's not accessible anywhere yet so no locking is needed at all - in fact it's wrong to lock it here (the lock isnt initialized yet). This bug was introduced via: | commit a6ac65db2329e7685299666f5f7b6093c7b0f3a0 | Date: Thu Jul 30 01:06:12 2009 +0000 | | net: restore the original spinlock to protect unicast list Signed-off-by: Ingo Molnar <mingo@elte.hu> Acked-by: Jiri Pirko <jpirko@redhat.com> Tested-by: Mark Brown <broonie@opensource.wolfsonmicro.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-08-02net: restore the original spinlock to protect unicast listJiri Pirko
There is a path when an assetion in dev_unicast_sync() appears. igmp6_group_added -> dev_mc_add -> __dev_set_rx_mode -> -> vlan_dev_set_rx_mode -> dev_unicast_sync Therefore we cannot protect this list with rtnl. This patch restores the original protecting this list with spinlock. Signed-off-by: Jiri Pirko <jpirko@redhat.com> Tested-by: Meelis Roos <mroos@linux.ee> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-07-27cfg80211: make aware of net namespacesJohannes Berg
In order to make cfg80211/nl80211 aware of network namespaces, we have to do the following things: * del_virtual_intf method takes an interface index rather than a netdev pointer - simply change this * nl80211 uses init_net a lot, it changes to use the sender's network namespace * scan requests use the interface index, hold a netdev pointer and reference instead * we want a wiphy and its associated virtual interfaces to be in one netns together, so - we need to be able to change ns for a given interface, so export dev_change_net_namespace() - for each virtual interface set the NETIF_F_NETNS_LOCAL flag, and clear that flag only when the wiphy changes ns, to disallow breaking this invariant * when a network namespace goes away, we need to reparent the wiphy to init_net * cfg80211 users that support creating virtual interfaces must create them in the wiphy's namespace, currently this affects only mac80211 The end result is that you can now switch an entire wiphy into a different network namespace with the new command iw phy#<idx> set netns <pid> and all virtual interfaces will follow (or the operation fails). Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-07-24net: export __dev_addr_sync/__dev_addr_unsyncJohannes Berg
For mac80211, with the master netdev removal, we need to be able to sync a multicast address list onto another list that is not tracked within a netdev, so we need access to the functions doing that. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-07-05net: convert remaining non-symbolic return values in ndo_start_xmit() functionsPatrick McHardy
This patch converts the remaining occurences of raw return values to their symbolic counterparts in ndo_start_xmit() functions that were missed by the previous automatic conversion. Additionally code that assumed the symbolic value of NETDEV_TX_OK to be zero is changed to explicitly use NETDEV_TX_OK. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-26gro: Flush GRO packets in napi_disable_pending pathHerbert Xu
When NAPI is disabled while we're in net_rx_action, we end up calling __napi_complete without flushing GRO packets. This is a bug as it would cause the GRO packets to linger, of course it also literally BUGs to catch error like this :) This patch changes it to napi_complete, with the obligatory IRQ reenabling. This should be safe because we've only just disabled IRQs and it does not materially affect the test conditions in between. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-23net: Move rx skb_orphan call to where neededHerbert Xu
In order to get the tun driver to account packets, we need to be able to receive packets with destructors set. To be on the safe side, I added an skb_orphan call for all protocols by default since some of them (IP in particular) cannot handle receiving packets destructors properly. Now it seems that at least one protocol (CAN) expects to be able to pass skb->sk through the rx path without getting clobbered. So this patch attempts to fix this properly by moving the skb_orphan call to where it's actually needed. In particular, I've added it to skb_set_owner_[rw] which is what most users of skb->destructor call. This is actually an improvement for tun too since it means that we only give back the amount charged to the socket when the skb is passed to another socket that will also be charged accordingly. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Tested-by: Oliver Hartkopp <olver@hartkopp.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-18net: group address list and its countJiri Pirko
This patch is inspired by patch recently posted by Johannes Berg. Basically what my patch does is to group list and a count of addresses into newly introduced structure netdev_hw_addr_list. This brings us two benefits: 1) struct net_device becames a bit nicer. 2) in the future there will be a possibility to operate with lists independently on netdevices (with exporting right functions). I wanted to introduce this patch before I'll post a multicast lists conversion. Signed-off-by: Jiri Pirko <jpirko@redhat.com> drivers/net/bnx2.c | 4 +- drivers/net/e1000/e1000_main.c | 4 +- drivers/net/ixgbe/ixgbe_main.c | 6 +- drivers/net/mv643xx_eth.c | 2 +- drivers/net/niu.c | 4 +- drivers/net/virtio_net.c | 10 ++-- drivers/s390/net/qeth_l2_main.c | 2 +- include/linux/netdevice.h | 17 +++-- net/core/dev.c | 130 ++++++++++++++++++-------------------- 9 files changed, 89 insertions(+), 90 deletions(-) Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-15Merge branch 'master' of ↵David S. Miller
master.kernel.org:/pub/scm/linux/kernel/git/torvalds/linux-2.6 Conflicts: Documentation/feature-removal-schedule.txt drivers/scsi/fcoe/fcoe.c net/core/drop_monitor.c net/core/net-traces.c
2009-06-11bridge: Simplify interface for ATM LANEMichał Mirosław
This patch changes FDB entry check for ATM LANE bridge integration. There's no point in holding a FDB entry around SKB building. br_fdb_get()/br_fdb_put() pair are changed into single br_fdb_test_addr() hook that checks if the addr has FDB entry pointing to other port to the one the request arrived on. FDB entry refcounting is removed as it's not used anywhere else. Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Acked-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-11[PATCH] net core: Some interface flags not returned by SIOCGIFFLAGSJohn Dykstra
Commit b00055aacdb172c05067612278ba27265fcd05ce " [NET] core: add RFC2863 operstate" defined new interface flag values. Its documentation specified that these flags could be accessed from user space via SIOCGIFFLAGS. However, this does not work because the new flags do not fit in that ioctl's argument width. Change the documentation to match the code's behavior. Also change the source to explicitly show the truncation. This _should_ have no effect on executable code, and did not with gcc 4.2.4 generating x86 code. A new ioctl could be defined to return all interface flags to user space. However, since this has been broken for three years with no one complaining, there doesn't seem much need. They are still accessible via netlink. Reported-by: "Fredrik Arnerup" <fredrik.arnerup@edgeware.tv> Signed-off-by: John Dykstra <john.dykstra1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-09Add constants for the ieee 802.15.4 stackSergey Lapin
IEEE 802.15.4 stack requires several constants to be defined/adjusted. Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com> Signed-off-by: Sergey Lapin <slapin@ossfans.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-09net: dev_addr_init() fixEric Dumazet
commit f001fde5eadd915f4858d22ed70d7040f48767cf (net: introduce a list of device addresses dev_addr_list (v6)) added one regression Vegard Nossum found in its testings. With kmemcheck help, Vegard found some uninitialized memory was read and reported to user, potentialy leaking kernel data. ( thread can be found on http://lkml.org/lkml/2009/5/30/177 ) dev_addr_init() incorrectly uses sizeof() operator. We were initializing one byte instead of MAX_ADDR_LEN bytes. Reported-by: Vegard Nossum <vegard.nossum@gmail.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-09net/core/dev.c: Use frag list abstraction interfaces.David S. Miller
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-03net: introduce pre-up netdev notifierJohannes Berg
NETDEV_UP is called after the device is set UP, but sometimes it is useful to be able to veto the device UP. Introduce a new NETDEV_PRE_UP notifier that can be used for exactly this. The first use case will be cfg80211 denying interfaces to be set UP if the device is known to be rfkill'ed. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-06-03net: skb->dst accessorsEric Dumazet
Define three accessors to get/set dst attached to a skb struct dst_entry *skb_dst(const struct sk_buff *skb) void skb_dst_set(struct sk_buff *skb, struct dst_entry *dst) void skb_dst_drop(struct sk_buff *skb) This one should replace occurrences of : dst_release(skb->dst) skb->dst = NULL; Delete skb->dst field Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-29net: convert unicast addr listJiri Pirko
This patch converts unicast address list to standard list_head using previously introduced struct netdev_hw_addr. It also relaxes the locking. Original spinlock (still used for multicast addresses) is not needed and is no longer used for a protection of this list. All reading and writing takes place under rtnl (with no changes). I also removed a possibility to specify the length of the address while adding or deleting unicast address. It's always dev->addr_len. The convertion touched especially e1000 and ixgbe codes when the change is not so trivial. Signed-off-by: Jiri Pirko <jpirko@redhat.com> drivers/net/bnx2.c | 13 +-- drivers/net/e1000/e1000_main.c | 24 +++-- drivers/net/ixgbe/ixgbe_common.c | 14 ++-- drivers/net/ixgbe/ixgbe_common.h | 4 +- drivers/net/ixgbe/ixgbe_main.c | 6 +- drivers/net/ixgbe/ixgbe_type.h | 4 +- drivers/net/macvlan.c | 11 +- drivers/net/mv643xx_eth.c | 11 +- drivers/net/niu.c | 7 +- drivers/net/virtio_net.c | 7 +- drivers/s390/net/qeth_l2_main.c | 6 +- drivers/scsi/fcoe/fcoe.c | 16 ++-- include/linux/netdevice.h | 18 ++-- net/8021q/vlan.c | 4 +- net/8021q/vlan_dev.c | 10 +- net/core/dev.c | 195 +++++++++++++++++++++++++++----------- net/dsa/slave.c | 10 +- net/packet/af_packet.c | 4 +- 18 files changed, 227 insertions(+), 137 deletions(-) Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-27net: ALIGN/PTR_ALIGN cleanup in alloc_netdev_mq()/netdev_priv()Eric Dumazet
Use ALIGN() and PTR_ALIGN() macros instead of handcoding them. Get rid of NETDEV_ALIGN_CONST ugly define Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-27gro: Open-code final pskb_may_pullHerbert Xu
As we know the only packets which need the final pskb_may_pull are completely non-linear, and have all the required bits in frag0, we can perform a straight memcpy instead of going through pskb_may_pull and doing skb_copy_bits. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-27gro: Avoid unnecessary comparison after skb_gro_headerHerbert Xu
For the overwhelming majority of cases, skb_gro_header's return value cannot be NULL. Yet we must check it because of its current form. This patch splits it up into multiple functions in order to avoid this. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-27gro: Optimise length comparison in skb_gro_headerHerbert Xu
By caching frag0_len, we can avoid checking both frag0 and the length separately in skb_gro_header. This helps as skb_gro_header is called four times per packet which amounts to a few million times at 10Gb/s. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-27gro: Only use skb_gro_header for completely non-linear packetsHerbert Xu
Currently skb_gro_header is used for packets which put the hardware header in skb->data with the rest in frags. Since the drivers that need this optimisation all provide completely non-linear packets, we can gain extra optimisations by only performing the frag0 optimisation for completely non-linear packets. In particular, we can simply test frag0 (instead of skb_headlen) to see whether the optimisation is in force. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-27gro: Inline skb_gro_header and cache frag0 virtual addressHerbert Xu
The function skb_gro_header is called four times per packet which quickly adds up at 10Gb/s. This patch inlines it to allow better optimisations. Some architectures perform multiplication for page_address, which is done by each skb_gro_header invocation. This patch caches that value in skb->cb to avoid the unnecessary multiplications. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-25net: txq_trans_update() helperEric Dumazet
We would like to get rid of netdev->trans_start = jiffies; that about all net drivers have to use in their start_xmit() function, and use txq->trans_start instead. This can be done generically in core network, as suggested by David. Some devices, (particularly loopback) dont need trans_start update, because they dont have transmit watchdog. We could add a new device flag, or rely on fact that txq->tran_start can be updated is txq->xmit_lock_owner is different than -1. Use a helper function to hide our choice. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-25net: remove COMPAT_NET_DEV_OPSAlexander Beregalov
All drivers are already converted to new net_device_ops API and nobody uses old API anymore. Signed-off-by: Alexander Beregalov <a.beregalov@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-21dropmon: add ability to detect when hardware dropsrxpacketsNeil Horman
Patch to add the ability to detect drops in hardware interfaces via dropwatch. Adds a tracepoint to net_rx_action to signal everytime a napi instance is polled. The dropmon code then periodically checks to see if the rx_frames counter has changed, and if so, adds a drop notification to the netlink protocol, using the reserved all-0's vector to indicate the drop location was in hardware, rather than somewhere in the code. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> include/linux/net_dropmon.h | 8 ++ include/trace/napi.h | 11 +++ net/core/dev.c | 5 + net/core/drop_monitor.c | 124 ++++++++++++++++++++++++++++++++++++++++++-- net/core/net-traces.c | 4 + net/core/netpoll.c | 2 6 files changed, 149 insertions(+), 5 deletions(-) Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-18net: release dst entry in dev_hard_start_xmit()Eric Dumazet
One point of contention in high network loads is the dst_release() performed when a transmited skb is freed. This is because NIC tx completion calls dev_kree_skb() long after original call to dev_queue_xmit(skb). CPU cache is cold and the atomic op in dst_release() stalls. On SMP, this is quite visible if one CPU is 100% handling softirqs for a network device, since dst_clone() is done by other cpus, involving cache line ping pongs. It seems right place to release dst is in dev_hard_start_xmit(), for most devices but ones that are virtual, and some exceptions. David Miller suggested to define a new device flag, set in alloc_netdev_mq() (so that most devices set it at init time), and carefuly unset in devices which dont want a NULL skb->dst in their ndo_start_xmit(). List of devices that must clear this flag is : - loopback device, because it calls netif_rx() and quoting Patrick : "ip_route_input() doesn't accept loopback addresses, so loopback packets already need to have a dst_entry attached." - appletalk/ipddp.c : needs skb->dst in its xmit function - And all devices that call again dev_queue_xmit() from their xmit function (as some classifiers need skb->dst) : bonding, vlan, macvlan, eql, ifb, hdlc_fr Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-18net: add tx_packets/tx_bytes/tx_dropped counters in struct netdev_queueEric Dumazet
offsetof(struct net_device, features)=0x44 offsetof(struct net_device, stats.tx_packets)=0x54 offsetof(struct net_device, stats.tx_bytes)=0x5c offsetof(struct net_device, stats.tx_dropped)=0x6c Network drivers that touch dev->stats.tx_packets/stats.tx_bytes in their tx path can slow down SMP operations, since they dirty a cache line that should stay shared (dev->features is needed in rx and tx paths) We could move away stats field in net_device but it wont help that much. (Two cache lines dirtied in tx path, we can do one only) Better solution is to add tx_packets/tx_bytes/tx_dropped in struct netdev_queue because this structure is already touched in tx path and counters updates will then be free (no increase in size) Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-09net: check retval of dev_addr_init()Jiri Pirko
Add missed checking of dev_addr_init return value in alloc_netdev_mq. Signed-off-by: Jiri Pirko <jpirko@redhat.com> net/core/dev.c | 15 ++++++++++++--- 1 files changed, 12 insertions(+), 3 deletions(-) Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-05net: introduce a list of device addresses dev_addr_list (v6)Jiri Pirko
v5 -> v6 (current): -removed so far unused static functions -corrected dev_addr_del_multiple to call del instead of add v4 -> v5: -added device address type (suggested by davem) -removed refcounting (better to have simplier code then safe potentially few bytes) v3 -> v4: -changed kzalloc to kmalloc in __hw_addr_add_ii() -ASSERT_RTNL() avoided in dev_addr_flush() and dev_addr_init() v2 -> v3: -removed unnecessary rcu read locking -moved dev_addr_flush() calling to ensure no null dereference of dev_addr v1 -> v2: -added forgotten ASSERT_RTNL to dev_addr_init and dev_addr_flush -removed unnecessary rcu_read locking in dev_addr_init -use compare_ether_addr_64bits instead of compare_ether_addr -use L1_CACHE_BYTES as size for allocating struct netdev_hw_addr -use call_rcu instead of rcu_synchronize -moved is_etherdev_addr into __KERNEL__ ifdef This patch introduces a new list in struct net_device and brings a set of functions to handle the work with device address list. The list is a replacement for the original dev_addr field and because in some situations there is need to carry several device addresses with the net device. To be backward compatible, dev_addr is made to point to the first member of the list so original drivers sees no difference. Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-03net: Avoid modulus in skb_tx_hash() for forwarding case.David S. Miller
Based almost entirely upon a patch by Eric Dumazet. The common case is to have num-tx-queues <= num_rx_queues and even if num_tx_queues is larger it will not be significantly larger. Therefore, a subtraction loop is always going to be faster than modulus. Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-03Merge branch 'master' of ↵David S. Miller
master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
2009-05-01net: Fix skb_tx_hash() for forwarding workloads.Eric Dumazet
When skb_rx_queue_recorded() is true, we dont want to use jash distribution as the device driver exactly told us which queue was selected at RX time. jhash makes a statistical shuffle, but this wont work with 8 static inputs. Later improvements would be to compute reciprocal value of real_num_tx_queues to avoid a divide here. But this computation should be done once, when real_num_tx_queues is set. This needs a separate patch, and a new field in struct net_device. Reported-by: Andrew Dickinson <andrew@whydna.net> Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-04-27gro: Fix handling of headers that extend over the tailHerbert Xu
The skb_gro_* code fails to handle the case where a header starts in the linear area but ends in the frags area. Since the goal of skb_gro_* is to optimise the case of completely non-linear packets, we can simply bail out if we have anything in the linear area. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-04-21Merge branch 'master' of ↵David S. Miller
master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: net/core/dev.c
2009-04-20net: Fix GRO for multiple page fragmentsBen Hutchings
This loop over fragments in napi_fraginfo_skb() was "interesting". Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-04-20net: fix "compatibility" typosMarcin Slusarz
Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-04-20net: sch_netem: Fix an inconsistency in ingress netem timestamps.Jarek Poplawski
Alex Sidorenko reported: "while experimenting with 'netem' we have found some strange behaviour. It seemed that ingress delay as measured by 'ping' command shows up on some hosts but not on others. After some investigation I have found that the problem is that skbuff->tstamp field value depends on whether there are any packet sniffers enabled. That is: - if any ptype_all handler is registered, the tstamp field is as expected - if there are no ptype_all handlers, the tstamp field does not show the delay" This patch prevents unnecessary update of tstamp in dev_queue_xmit_nit() on ingress path (with act_mirred) adding a check, so minimal overhead on the fast path, but only when sniffers etc. are active. Since netem at ingress seems to logically emulate a network before a host, tstamp is zeroed to trigger the update and pretend delays are from the outside. Reported-by: Alex Sidorenko <alexandre.sidorenko@hp.com> Tested-by: Alex Sidorenko <alexandre.sidorenko@hp.com> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>