aboutsummaryrefslogtreecommitdiff
path: root/net/core/dev.c
AgeCommit message (Collapse)Author
2008-07-17netdev: Add netdev->select_queue() method.David S. Miller
Devices or device layers can set this to control the queue selection performed by dev_pick_tx(). This function runs under RCU protection, which allows overriding functions to have some way of synchronizing with things like dynamic ->real_num_tx_queues adjustments. This makes the spinlock prefetch in dev_queue_xmit() a little bit less effective, but that's the price right now for correctness. Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-17net: Use queue aware tests throughout.David S. Miller
This effectively "flips the switch" by making the core networking and multiqueue-aware drivers use the new TX multiqueue structures. Non-multiqueue drivers need no changes. The interfaces they use such as netif_stop_queue() degenerate into an operation on TX queue zero. So everything "just works" for them. Code that really wants to do "X" to all TX queues now invokes a routine that does so, such as netif_tx_wake_all_queues(), netif_tx_stop_all_queues(), etc. pktgen and netpoll required a little bit more surgery than the others. In particular the pktgen changes, whilst functional, could be largely improved. The initial check in pktgen_xmit() will sometimes check the wrong queue, which is mostly harmless. The thing to do is probably to invoke fill_packet() earlier. The bulk of the netpoll changes is to make the code operate solely on the TX queue indicated by by the SKB queue mapping. Setting of the SKB queue mapping is entirely confined inside of net/core/dev.c:dev_pick_tx(). If we end up needing any kind of special semantics (drops, for example) it will be implemented here. Finally, we now have a "real_num_tx_queues" which is where the driver indicates how many TX queues are actually active. With IGB changes from Jeff Kirsher. Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-17netdev: Allocate multiple queues for TX.David S. Miller
alloc_netdev_mq() now allocates an array of netdev_queue structures for TX, based upon the queue_count argument. Furthermore, all accesses to the TX queues are now vectored through the netdev_get_tx_queue() and netdev_for_each_tx_queue() interfaces. This makes it easy to grep the tree for all things that want to get to a TX queue of a net device. Problem spots which are not really multiqueue aware yet, and only work with one queue, can easily be spotted by grepping for all netdev_get_tx_queue() calls that pass in a zero index. Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-16Merge branch 'linus' into cpus4096Ingo Molnar
Conflicts: arch/x86/xen/smp.c kernel/sched_rt.c net/iucv/iucv.c Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-15netdev: Do not use TX lock to protect address lists.David S. Miller
Now that we have a specific lock to protect the network device unicast and multicast lists, remove extraneous grabs of the TX lock in cases where the code only needs address list protection. Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-15netdev: Add netdev->addr_list_lock protection.David S. Miller
Add netif_addr_{lock,unlock}{,_bh}() helpers. Use them to protect operations that operate on or read the network device unicast and multicast address lists. Also use them in cases where the code simply wants to block calls into the driver's ->set_rx_mode() and ->set_multicast_list() methods. Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-15netdev: Add addr_list_lock to struct net_device.David S. Miller
This will be used to protect the per-device unicast and multicast address lists, as well as the callbacks into the drivers which configure such state such as ->set_rx_mode() and ->set_multicast_list(). Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-14vlan: deliver packets received with VLAN acceleration to network tapsPatrick McHardy
When VLAN header stripping is used, packets currently bypass packet sockets (and other network taps) completely. For locally existing VLANs, they appear directly on the VLAN device, for unknown VLANs they are silently dropped. Add a new function netif_nit_deliver() to deliver incoming packets to all network interface taps and use it in __vlan_hwaccel_rx() to make VLAN packets visible on the underlying device. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-14Merge branch 'core/softirq' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core/softirq' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: softirq: remove irqs_disabled warning from local_bh_enable softirq: remove initialization of static per-cpu variable Remove argument from open_softirq which is always NULL
2008-07-08netdev: Move _xmit_lock and xmit_lock_owner into netdev_queue.David S. Miller
Accesses are mostly structured such that when there are multiple TX queues the code transformations will be a little bit simpler. Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-08pkt_sched: Make qdisc_run take a netdev_queue.David S. Miller
This allows us to use this calling convention all the way down into qdisc_restart(). Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-08netdev: Make netif_schedule() routines work with netdev_queue objects.David S. Miller
Only plain netif_schedule() remains taking a net_device, mostly as a compatability item while we transition the rest of these interfaces. Everything else calls netif_schedule_queue() or __netif_schedule(), both of which take a netdev_queue pointer. Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-08netdev: Move next_sched into struct netdev_queue.David S. Miller
We schedule queues, not the device, for output queue processing in BH. Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-08netdev: Kill qdisc_ingress, use netdev->rx_queue.qdisc instead.David S. Miller
Now that our qdisc management is bi-directional, per-queue, and fully orthogonal, there is no reason to have a special ingress qdisc pointer in struct net_device. Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-08netdev: Move rest of qdisc state into struct netdev_queueDavid S. Miller
Now qdisc, qdisc_sleeping, and qdisc_list also live there. Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-08netdev: The ingress_lock member is no longer needed.David S. Miller
Every qdisc is assosciated with a queue, and in the case of ingress qdiscs that will now be netdev->rx_queue so using that queue's lock is the thing to do. Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-08netdev: Move queue_lock into struct netdev_queue.David S. Miller
The lock is now an attribute of the device queue. One thing to notice is that "suspicious" places emerge which will need specific training about multiple queue handling. They are so marked with explicit "netdev->rx_queue" and "netdev->tx_queue" references. Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-08netdev: Create netdev_queue abstraction.David S. Miller
A netdev_queue is an entity managed by a qdisc. Currently there is one RX and one TX queue, and a netdev_queue merely contains a backpointer to the net_device. The Qdisc struct is augmented with a netdev_queue pointer as well. Eventually the 'dev' Qdisc member will go away and we will have the resulting hierarchy: net_device --> netdev_queue --> Qdisc Also, qdisc_alloc() and qdisc_create_dflt() now take a netdev_queue pointer argument. Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-06net: fix dev_set_promiscuity() breakagePatrick McHardy
Commit dad9b335 (netdevice: Fix promiscuity and allmulti overflow) broke dev_set_promiscuity() by returning on success without reprogramming the device. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-06Merge commit 'v2.6.26-rc9' into cpus4096Ingo Molnar
2008-07-05Merge branch 'master' of ↵David S. Miller
master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: Documentation/feature-removal-schedule.txt drivers/net/wan/hdlc_fr.c drivers/net/wireless/iwlwifi/iwl-4965.c drivers/net/wireless/iwlwifi/iwl3945-base.c
2008-07-01netdevice: Fix wrong string handle in kernel command line parsingWang Chen
v1->v2: Use strlcpy() to ensure s[i].name be null-termination. 1. In netdev_boot_setup_add(), a long name will leak. ex. : dev=21,0x1234,0x1234,0x2345,eth123456789verylongname......... 2. In netdev_boot_setup_check(), mismatch will happen if s[i].name is a substring of dev->name. ex. : dev=...eth1 dev=...eth11 [ With feedback from Ben Hutchings. ] Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-28Merge branch 'master' of ↵David S. Miller
master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/wireless/iwlwifi/iwl4965-base.c
2008-06-27netdevice: Fix typo of dev_unicast_add() commentWang Chen
Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-23Merge branch 'linus' into core/softirqIngo Molnar
2008-06-20netns: Don't receive new packets in a dead network namespace.Eric W. Biederman
Alexey Dobriyan <adobriyan@gmail.com> writes: > Subject: ICMP sockets destruction vs ICMP packets oops > After icmp_sk_exit() nuked ICMP sockets, we get an interrupt. > icmp_reply() wants ICMP socket. > > Steps to reproduce: > > launch shell in new netns > move real NIC to netns > setup routing > ping -i 0 > exit from shell > > BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 > IP: [<ffffffff803fce17>] icmp_sk+0x17/0x30 > PGD 17f3cd067 PUD 17f3ce067 PMD 0 > Oops: 0000 [1] PREEMPT SMP DEBUG_PAGEALLOC > CPU 0 > Modules linked in: usblp usbcore > Pid: 0, comm: swapper Not tainted 2.6.26-rc6-netns-ct #4 > RIP: 0010:[<ffffffff803fce17>] [<ffffffff803fce17>] icmp_sk+0x17/0x30 > RSP: 0018:ffffffff8057fc30 EFLAGS: 00010286 > RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff81017c7db900 > RDX: 0000000000000034 RSI: ffff81017c7db900 RDI: ffff81017dc41800 > RBP: ffffffff8057fc40 R08: 0000000000000001 R09: 000000000000a815 > R10: 0000000000000000 R11: 0000000000000001 R12: ffffffff8057fd28 > R13: ffffffff8057fd00 R14: ffff81017c7db938 R15: ffff81017dc41800 > FS: 0000000000000000(0000) GS:ffffffff80525000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b > CR2: 0000000000000000 CR3: 000000017fcda000 CR4: 00000000000006e0 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > Process swapper (pid: 0, threadinfo ffffffff8053a000, task ffffffff804fa4a0) > Stack: 0000000000000000 ffff81017c7db900 ffffffff8057fcf0 ffffffff803fcfe4 > ffffffff804faa38 0000000000000246 0000000000005a40 0000000000000246 > 000000000001ffff ffff81017dd68dc0 0000000000005a40 0000000055342436 > Call Trace: > <IRQ> [<ffffffff803fcfe4>] icmp_reply+0x44/0x1e0 > [<ffffffff803d3a0a>] ? ip_route_input+0x23a/0x1360 > [<ffffffff803fd645>] icmp_echo+0x65/0x70 > [<ffffffff803fd300>] icmp_rcv+0x180/0x1b0 > [<ffffffff803d6d84>] ip_local_deliver+0xf4/0x1f0 > [<ffffffff803d71bb>] ip_rcv+0x33b/0x650 > [<ffffffff803bb16a>] netif_receive_skb+0x27a/0x340 > [<ffffffff803be57d>] process_backlog+0x9d/0x100 > [<ffffffff803bdd4d>] net_rx_action+0x18d/0x250 > [<ffffffff80237be5>] __do_softirq+0x75/0x100 > [<ffffffff8020c97c>] call_softirq+0x1c/0x30 > [<ffffffff8020f085>] do_softirq+0x65/0xa0 > [<ffffffff80237af7>] irq_exit+0x97/0xa0 > [<ffffffff8020f198>] do_IRQ+0xa8/0x130 > [<ffffffff80212ee0>] ? mwait_idle+0x0/0x60 > [<ffffffff8020bc46>] ret_from_intr+0x0/0xf > <EOI> [<ffffffff80212f2c>] ? mwait_idle+0x4c/0x60 > [<ffffffff80212f23>] ? mwait_idle+0x43/0x60 > [<ffffffff8020a217>] ? cpu_idle+0x57/0xa0 > [<ffffffff8040f380>] ? rest_init+0x70/0x80 > Code: 10 5b 41 5c 41 5d 41 5e c9 c3 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 53 > 48 83 ec 08 48 8b 9f 78 01 00 00 e8 2b c7 f1 ff 89 c0 <48> 8b 04 c3 48 83 c4 08 > 5b c9 c3 66 66 66 66 66 2e 0f 1f 84 00 > RIP [<ffffffff803fce17>] icmp_sk+0x17/0x30 > RSP <ffffffff8057fc30> > CR2: 0000000000000000 > ---[ end trace ea161157b76b33e8 ]--- > Kernel panic - not syncing: Aiee, killing interrupt handler! Receiving packets while we are cleaning up a network namespace is a racy proposition. It is possible when the packet arrives that we have removed some but not all of the state we need to fully process it. We have the choice of either playing wack-a-mole with the cleanup routines or simply dropping packets when we don't have a network namespace to handle them. Since the check looks inexpensive in netif_receive_skb let's just drop the incoming packets. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-19net: Disable LRO on devices that are forwardingBen Hutchings
Large Receive Offload (LRO) is only appropriate for packets that are destined for the host, and should be disabled if received packets may be forwarded. It can also confuse the GSO on output. Add dev_disable_lro() function which uses the appropriate ethtool ops to disable LRO if enabled. Add calls to dev_disable_lro() in br_add_if() and functions that enable IPv4 and IPv6 forwarding. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-18netdevice: Fix promiscuity and allmulti overflowWang Chen
Max of promiscuity and allmulti plus positive @inc can cause overflow. Fox example: when allmulti=0xFFFFFFFF, any caller give dev_set_allmulti() a positive @inc will cause allmulti be off. This is not what we want, though it's rare case. The fix is that only negative @inc will cause allmulti or promiscuity be off and when any caller makes the counters touch the roof, we return error. Change of v2: Change void function dev_set_promiscuity/allmulti to return int. So callers can get the overflow error. Caller's fix will be done later. Change of v3: 1. Since we return error to caller, we don't need to print KERN_ERROR, KERN_WARNING is enough. 2. In dev_set_promiscuity(), if __dev_set_promiscuity() failed, we return at once. Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-17net/core: add NETDEV_BONDING_FAILOVER eventOr Gerlitz
Add NETDEV_BONDING_FAILOVER event to be used in a successive patch by bonding to announce fail-over for the active-backup mode through the netdev events notifier chain mechanism. Such an event can be of use for the RDMA CM (communication manager) to let native RDMA ULPs (eg NFS-RDMA, iSER) always be aligned with the IP stack, in the sense that they use the same ports/links as the stack does. More usages can be done to allow monitoring tools based on netlink events being aware to bonding fail-over. Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2008-06-16net: Fix test for VLAN TX checksum offload capabilityBen Hutchings
Selected device feature bits can be propagated to VLAN devices, so we can make use of TX checksum offload and TSO on VLAN-tagged packets. However, if the physical device does not do VLAN tag insertion or generic checksum offload then the test for TX checksum offload in dev_queue_xmit() will see a protocol of htons(ETH_P_8021Q) and yield false. This splits the checksum offload test into two functions: - can_checksum_protocol() tests a given protocol against a feature bitmask - dev_can_checksum() first tests the skb protocol against the device features; if that fails and the protocol is htons(ETH_P_8021Q) then it tests the encapsulated protocol against the effective device features for VLANs Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-05-25Remove argument from open_softirq which is always NULLCarlos R. Mafra
As git-grep shows, open_softirq() is always called with the last argument being NULL block/blk-core.c: open_softirq(BLOCK_SOFTIRQ, blk_done_softirq, NULL); kernel/hrtimer.c: open_softirq(HRTIMER_SOFTIRQ, run_hrtimer_softirq, NULL); kernel/rcuclassic.c: open_softirq(RCU_SOFTIRQ, rcu_process_callbacks, NULL); kernel/rcupreempt.c: open_softirq(RCU_SOFTIRQ, rcu_process_callbacks, NULL); kernel/sched.c: open_softirq(SCHED_SOFTIRQ, run_rebalance_domains, NULL); kernel/softirq.c: open_softirq(TASKLET_SOFTIRQ, tasklet_action, NULL); kernel/softirq.c: open_softirq(HI_SOFTIRQ, tasklet_hi_action, NULL); kernel/timer.c: open_softirq(TIMER_SOFTIRQ, run_timer_softirq, NULL); net/core/dev.c: open_softirq(NET_TX_SOFTIRQ, net_tx_action, NULL); net/core/dev.c: open_softirq(NET_RX_SOFTIRQ, net_rx_action, NULL); This observation has already been made by Matthew Wilcox in June 2002 (http://www.cs.helsinki.fi/linux/linux-kernel/2002-25/0687.html) "I notice that none of the current softirq routines use the data element passed to them." and the situation hasn't changed since them. So it appears we can safely remove that extra argument to save 128 (54) bytes of kernel data (text). Signed-off-by: Carlos R. Mafra <crmafra@ift.unesp.br> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-23net: use performance variant for_each_cpu_mask_nrMike Travis
Change references from for_each_cpu_mask to for_each_cpu_mask_nr where appropriate Reviewed-by: Paul Jackson <pj@sgi.com> Reviewed-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Mike Travis <travis@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-20net: Fix call to ->change_rx_flags(dev, IFF_MULTICAST) in dev_change_flags()David Woodhouse
Am I just being particularly dim today, or can the call to dev->change_rx_flags(dev, IFF_MULTICAST) in dev_change_flags() never happen? We've just set dev->flags = flags & IFF_MULTICAST, effectively. So the condition '(dev->flags ^ flags) & IFF_MULTICAST' is _never_ going to be true. Signed-off-by: David Woodhouse <dwmw2@infradead.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-05-14net: handle errors from device_renameStephen Hemminger
device_rename can fail with -EEXIST or -ENOMEM, so handle any problems. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-05-08net: Added ASSERT_RTNL() to dev_open() and dev_close().Ben Hutchings
dev_open() and dev_close() must be called holding the RTNL, since they call device functions and netdevice notifiers that are promised the RTNL. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-05-08netns: Fix arbitrary net_device-s corruptions on net_ns stop.Pavel Emelyanov
When a net namespace is destroyed, some devices (those, not killed on ns stop explicitly) are moved back to init_net. The problem, is that this net_ns change has one point of failure - the __dev_alloc_name() may be called if a name collision occurs (and this is easy to trigger). This allocator performs a likely-to-fail GFP_ATOMIC allocation to find a suitable number. Other possible conditions that may cause error (for device being ns local or not registered) are always false in this case. So, when this call fails, the device is unregistered. But this is *not* the right thing to do, since after this the device may be released (and kfree-ed) improperly. E. g. bridges require more actions (sysfs update, timer disarming, etc.), some other devices want to remove their private areas from lists, etc. I. e. arbitrary use-after-free cases may occur. The proposed fix is the following: since the only reason for the dev_change_net_namespace to fail is the name generation, we may give it a unique fall-back name w/o %d-s in it - the dev<ifindex> one, since ifindexes are still unique. So make this change, raise the failure-case printk loglevel to EMERG and replace the unregister_netdevice call with BUG(). [ Use snprintf() -DaveM ] Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-05-02netns: Fix device renaming for sysfsDaniel Lezcano
When a netdev is moved across namespaces with the 'dev_change_net_namespace' function, the 'device_rename' function is used to fixup kobject and refresh the sysfs tree. The device_rename function will call kobject_rename and this one will check if there is an object with the same name and this is the case because we are renaming the object with the same name. The use of 'device_rename' seems for me wrong because we usually don't rename it but just move it across namespaces. As we just want to do a mini "netdev_[un]register", IMO the functions 'netdev_[un]register_kobject' should be used instead, like an usual network device [un]registering. This patch replace device_rename by netdev_unregister_kobject, followed by netdev_register_kobject. The netdev_register_kobject will call device_initialize and will raise a warning indicating the device was already initialized. In order to fix that, I split the device initialization into a separate function and use it together with 'netdev_register_kobject' into register_netdevice. So we can safely call 'netdev_register_kobject' in 'dev_change_net_namespace'. This fix will allow to properly use the sysfs per namespace which is coming from -mm tree. Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com> Acked-by: Benjamin Thery <benjamin.thery@bull.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-05-02net: remove NR_CPUS arrays in net/core/dev.cMike Travis
Remove the fixed size channels[NR_CPUS] array in net/core/dev.c and dynamically allocate array based on nr_cpu_ids. Signed-off-by: Mike Travis <travis@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-29Remove duplicated unlikely() in IS_ERR()Hirofumi Nakagawa
Some drivers have duplicated unlikely() macros. IS_ERR() already has unlikely() in itself. This patch cleans up such pointless code. Signed-off-by: Hirofumi Nakagawa <hnakagawa@miraclelinux.com> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Jeff Garzik <jeff@garzik.org> Cc: Paul Clements <paul.clements@steeleye.com> Cc: Richard Purdie <rpurdie@rpsys.net> Cc: Alessandro Zummo <a.zummo@towertech.it> Cc: David Brownell <david-b@pacbell.net> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: Michael Halcrow <mhalcrow@us.ibm.com> Cc: Anton Altaparmakov <aia21@cantab.net> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Carsten Otte <cotte@de.ibm.com> Cc: Patrick McHardy <kaber@trash.net> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Jaroslav Kysela <perex@perex.cz> Cc: Takashi Iwai <tiwai@suse.de> Acked-by: Mike Frysinger <vapier@gentoo.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-18[NET]: Fix and allocate less memory for ->priv'less netdevicesAlexey Dobriyan
This patch effectively reverts commit d0498d9ae1a5cebac363e38907266d5cd2eedf89 aka "[NET]: Do not allocate unneeded memory for dev->priv alignment." It was found to be buggy because of final unconditional += NETDEV_ALIGN_CONST removal. For example, for sizeof(struct net_device) being 2048 bytes, "alloc_size" was also 2048 bytes, but allocator with debugging options turned on started giving out !32-byte aligned memory resulting in redzones overwrites. Patch does small optimization in ->priv'less case: bumping size to next 32-byte boundary was always done to ensure ->priv will also be aligned. But, no ->priv, no need to do that. Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-16[NET]: Do not allocate unneeded memory for dev->priv alignment.Pavel Emelyanov
The alloc_netdev_mq() tries to produce 32-bytes alignment for both the net_device itself and its private data. The second alignment is achieved by adding the NETDEV_ALIGN_CONST to the whole size of the memory to be allocated. However, for those devices that do not need the private area, this addition just makes the net_device weight 1024 + 32 = 1068 bytes, i.e. consume twice as much memory. Since loopback device is such (sizeof_priv == 0 for it), and each net namespace creates one, this can save a noticeable amount of memory for kernel with net namespaces turned on. After this set the lo device is actually allocated from a size-1024 kmem cache on i386 box even with NETPOLL and WIRELESS_EXT turned on. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-16[NETNS]: Add netns refcnt debug for network devices.Denis V. Lunev
dev_set_net is called for - just allocated devices - devices moving from one namespace to another release_net has proper check inside to distinguish these cases. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-27Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/usb/rndis_host.c drivers/net/wireless/b43/dma.c net/ipv6/ndisc.c
2008-03-26[NET]: Fix multicast device ioctl checksPatrick McHardy
SIOCADDMULTI/SIOCDELMULTI check whether the driver has a set_multicast_list method to determine whether it supports multicast. Drivers implementing secondary unicast support use set_rx_mode however. Check for both dev->set_multicast_mode and dev->set_rx_mode to determine multicast capabilities. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-26[NET] NETNS: Omit namespace comparision without CONFIG_NET_NS.YOSHIFUJI Hideaki
Introduce an inline net_eq() to compare two namespaces. Without CONFIG_NET_NS, since no namespace other than &init_net exists, it is always 1. We do not need to convert 1) inline vs inline and 2) inline vs &init_net comparisons. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-03-26[NET] NETNS: Omit net_device->nd_net without CONFIG_NET_NS.YOSHIFUJI Hideaki
Introduce per-net_device inlines: dev_net(), dev_net_set(). Without CONFIG_NET_NS, no namespace other than &init_net exists. Let's explicitly define them to help compiler optimizations. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-03-24[NETNS]: Minor information leak via /proc/net/ptype file.Pavel Emelyanov
This file displays the registered packet types, but some of them (packet sockets creates such) can be bound to a net device and showing them in a wrong namespace is not correct. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-21[NET]: Add per-connection option to set max TSO frame sizePeter P Waskiewicz Jr
Update: My mailer ate one of Jarek's feedback mails... Fixed the parameter in netif_set_gso_max_size() to be u32, not u16. Fixed the whitespace issue due to a patch import botch. Changed the types from u32 to unsigned int to be more consistent with other variables in the area. Also brought the patch up to the latest net-2.6.26 tree. Update: Made gso_max_size container 32 bits, not 16. Moved the location of gso_max_size within netdev to be less hotpath. Made more consistent names between the sock and netdev layers, and added a define for the max GSO size. Update: Respun for net-2.6.26 tree. Update: changed max_gso_frame_size and sk_gso_max_size from signed to unsigned - thanks Stephen! This patch adds the ability for device drivers to control the size of the TSO frames being sent to them, per TCP connection. By setting the netdevice's gso_max_size value, the socket layer will set the GSO frame size based on that value. This will propogate into the TCP layer, and send TSO's of that size to the hardware. This can be desirable to help tune the bursty nature of TSO on a per-adapter basis, where one may have 1 GbE and 10 GbE devices coexisting in a system, one running multiqueue and the other not, etc. This can also be desirable for devices that cannot support full 64 KB TSO's, but still want to benefit from some level of segmentation offloading. Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (37 commits) [NETFILTER]: fix ebtable targets return [IP_TUNNEL]: Don't limit the number of tunnels with generic name explicitly. [NET]: Restore sanity wrt. print_mac(). [NEIGH]: Fix race between neighbor lookup and table's hash_rnd update. [RTNL]: Validate hardware and broadcast address attribute for RTM_NEWLINK tg3: ethtool phys_id default [BNX2]: Update version to 1.7.4. [BNX2]: Disable parallel detect on an HP blade. [BNX2]: More 5706S link down workaround. ssb: Fix support for PCI devices behind a SSB->PCI bridge zd1211rw: fix sparse warnings rtl818x: fix sparse warnings ssb: Fix pcicore cardbus mode ssb: Make the GPIO API reentrancy safe ssb: Fix the GPIO API ssb: Fix watchdog access for devices without a chipcommon ssb: Fix serial console on new bcm47xx devices ath5k: Fix build warnings on some 64-bit platforms. WDEV, ath5k, don't return int from bool function WDEV: ath5k, fix lock imbalance ...
2008-02-19[NET]: Messed multicast lists after dev_mc_sync/unsyncJorge Boncompte [DTI2]
Commit a0a400d79e3dd7843e7e81baa3ef2957bdc292d0 ("[NET]: dev_mcast: add multicast list synchronization helpers") from you introduced a new field "da_synced" to struct dev_addr_list that is not properly initialized to 0. So when any of the current users (8021q, macvlan, mac80211) calls dev_mc_sync/unsync they mess the address list for both devices. The attached patch fixed it for me and avoid future problems. Signed-off-by: Jorge Boncompte [DTI2] <jorge@dti2.net> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>