aboutsummaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2009-11-04net: Introduce for_each_netdev_rcu() iteratorEric Dumazet
Adds RCU management to the list of netdevices. Convert some for_each_netdev() users to RCU version, if it can avoid read_lock-ing dev_base_lock Ie: read_lock(&dev_base_loack); for_each_netdev(net, dev) some_action(); read_unlock(&dev_base_lock); becomes : rcu_read_lock(); for_each_netdev_rcu(net, dev) some_action(); rcu_read_unlock(); Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-04em_meta: avoid one dev_put()Eric Dumazet
Another rcu conversion to avoid one dev_hold()/dev_put() pair Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-04Phonet: remove tautologiesRémi Denis-Courmont
These checks don't make sense anymore since rtnl_notify() cannot fail. Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-02ipv6: no more dev_put() in datagram_send_ctl()Eric Dumazet
Avoids touching device refcount in datagram_send_ctl(), thanks to RCU Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-02ipv6: no more dev_put() in inet6_bind()Eric Dumazet
Avoids touching device refcount in inet6_bind(), thanks to RCU Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-02ip6tnl: less dev_put() callsEric Dumazet
Using dev_get_by_index_rcu() in ip6_tnl_rcv_ctl() & ip6_tnl_xmit_ctl() avoids touching device refcount. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-02packet: less dev_put() callsEric Dumazet
- packet_sendmsg_spkt() can use dev_get_by_name_rcu() to avoid touching device refcount. - packet_getname_spkt() & packet_getname() can use dev_get_by_index_rcu() to avoid touching device refcount too. tpacket_snd() & packet_snd() can not use RCU yet because they can sleep when allocating skb. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-01net: RCU locking for simple ioctl()Eric Dumazet
All ioctls() implemented by dev_ifsioc_locked() : SIOCGIFFLAGS, SIOCGIFMETRIC, SIOCGIFMTU, SIOCGIFHWADDR, SIOCGIFSLAVE, SIOCGIFMAP, SIOCGIFINDEX & SIOCGIFTXQLEN can use RCU lock instead of dev_base_lock rwlock Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-01icmp: icmp_send() can avoid a dev_put()Eric Dumazet
We can avoid touching device refcount in icmp_send(), using dev_get_by_index_rcu() Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-01ipv4: inetdev_by_index() switch to RCUEric Dumazet
Use dev_get_by_index_rcu() instead of __dev_get_by_index() and dev_base_lock rwlock Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-01veth: Fix unregister_netdevice_queue for vethEric W. Biederman
I tested the recent unregister many changes and got a weird, nasty and seemingly unrelasted kernel oops. Changing unregister_netdevice_queue to use list_move_tail fixes the problem for me. ip link add type veth rmmod veth ls /sys/class/net/ showed one of the veth devices still present. A subsequent ip link oopsed the box. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-01net: Introduce dev_get_by_name_rcu()Eric Dumazet
Some workloads hit dev_base_lock rwlock pretty hard. We can use RCU lookups to avoid touching this rwlock (and avoid touching netdevice refcount) netdevices are already freed after a RCU grace period, so this patch adds no penalty at device dismantle time. However, it adds a synchronize_rcu() call in dev_change_name() Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-30RDS/IB+IW: Move recv processing to a taskletAndy Grover
Move receive processing from event handler to a tasklet. This should help prevent hangcheck timer from going off when RDS is under heavy load. Signed-off-by: Andy Grover <andy.grover@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-30RDS: Do not send congestion updates to loopback connectionsAndy Grover
This issue was discovered by HP's Pradeep and fixed in OFED 1.3, but not fixed in later versions, since the fix's implementation was not immediately applyable to the later code. This patch should do the trick for 1.4+ codebases. Signed-off-by: Andy Grover <andy.grover@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-30RDS: Fix panic on unloadAndy Grover
Remove explicit destruction of passive connection when destroying active end of the connection. The passive end is also on the device's connection list, and will thus be cleaned up properly. Panic was caused by trying to clean it up twice. Signed-off-by: Andy Grover <andy.grover@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-30RDS: Fix potential race around rds_i[bw]_allocationAndy Grover
"At rds_ib_recv_refill_one(), it first executes atomic_read(&rds_ib_allocation) for if-condition checking, and then executes atomic_inc(&rds_ib_allocation) if the condition was not satisfied. However, if any other code which updates rds_ib_allocation executes between these two atomic operation executions, it seems that it may result race condition. (especially when rds_ib_allocation + 1 == rds_ib_sysctl_max_recv_allocation)" This patch fixes this by using atomic_inc_unless to eliminate the possibility of allocating more than rds_ib_sysctl_max_recv_allocation and then decrementing the count if the allocation fails. It also makes an identical change to the iwarp transport. Reported-by: Shin Hong <hongshin@gmail.com> Signed-off-by: Andy Grover <andy.grover@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-30RDS: Add GET_MR_FOR_DEST sockoptAndy Grover
RDS currently supports a GET_MR sockopt to establish a memory region (MR) for a chunk of memory. However, the fastreg method ties a MR to a particular destination. The GET_MR_FOR_DEST sockopt allows the remote machine to be specified, and thus support for fastreg (aka FRWRs). Note that this patch does *not* do all of this - it simply implements the new sockopt in terms of the old one, so applications can begin to use the new sockopt in preparation for cutover to FRWRs. Signed-off-by: Andy Grover <andy.grover@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-30net: Allow devices to specify a device specific sysfs group.Eric W. Biederman
This isn't beautifully abstracted, but it is simple, simplifies uses and so far is only needed for the bonding driver. Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-30net: use hlist_for_each_entry()Eric Dumazet
Small cleanup of __dev_get_by_name() and __dev_get_by_index() to use hlist_for_each_entry() : They'll look like their _rcu variant. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-29vlan: cleanup multiple unregistrationsPatrick McHardy
The temporary copy of the VLAN group is not neccessary since the lower device is already in the process of being unregistered, if it was neccessary the memset of the global group would introduce a race condition. With this removed, the changes to the original code are only a few lines, so remove the new function and move the code back into vlan_device_event(). Signed-off-by: Patrick McHardy <kaber@trash.net> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-29ax25: unsigned cannot be less than 0 in ax25_ctl_ioctl()roel kluin
struct ax25_ctl_struct member `arg' is unsigned and cannot be less than 0. Signed-off-by: Roel Kluin <roel.kluin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-29gro: Change all receive functions to return GRO result codesBen Hutchings
This will allow drivers to adjust their receive path dynamically based on whether GRO is being applied successfully. Currently all in-tree callers ignore the return values of these functions and do not need to be changed. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-29gro: Name the GRO result enumeration typeBen Hutchings
This clarifies which return and parameter types are GRO result codes and not RX result codes. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-29Merge branch 'master' of ↵David S. Miller
master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
2009-10-29net: Fix 'Re: PACKET_TX_RING: packet size is too long'Gabor Gombas
Currently PACKET_TX_RING forces certain amount of every frame to remain unused. This probably originates from an early version of the PACKET_TX_RING patch that in fact used the extra space when the (since removed) CONFIG_PACKET_MMAP_ZERO_COPY option was enabled. The current code does not make any use of this extra space. This patch removes the extra space reservation and lets userspace make use of the full frame size. Signed-off-by: Gabor Gombas <gombasg@sztaki.hu> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-29net,socket: introduce DECLARE_SOCKADDR helper to catch overflow at build timeCyrill Gorcunov
proto_ops->getname implies copying protocol specific data into storage unit (particulary to __kernel_sockaddr_storage). So when we implement new protocol support we should keep such a detail in mind (which is easy to forget about). Lets introduce DECLARE_SOCKADDR helper which check if storage unit is not overfowed at build time. Eventually inet_getname is switched to use DECLARE_SOCKADDR (to show example of usage). Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-29Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6
2009-10-29net: Introduce dev_get_by_index_rcu()Eric Dumazet
Some workloads hit dev_base_lock rwlock pretty hard. We can use RCU lookups to avoid touching this rwlock. netdevices are already freed after a RCU grace period, so this patch adds no penalty at device dismantle time. dev_ifname() converted to dev_get_by_index_rcu() Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-29net: Cleanup redundant tests on unsignedroel kluin
optlen is unsigned so the `< 0' test is never true. Signed-off-by: Roel Kluin <roel.kluin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-29net: Cleanup redundant tests on unsignedroel kluin
If there is data, the unsigned skb->len is greater than 0. rt.sigdigits is unsigned as well, so the test `>= 0' is always true, the other part of the test catches wrapped values. Signed-off-by: Roel Kluin <roel.kluin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-29Allow disabling of DSACK TCP option per routeGilad Ben-Yossef
Add and use no DSCAK bit in the features field. Signed-off-by: Gilad Ben-Yossef <gilad@codefidence.com> Sigend-off-by: Ori Finkelman <ori@comsleep.com> Sigend-off-by: Yony Amit <yony@comsleep.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-29Allow to turn off TCP window scale opt per routeGilad Ben-Yossef
Add and use no window scale bit in the features field. Note that this is not the same as setting a window scale of 0 as would happen with window limit on route. Signed-off-by: Gilad Ben-Yossef <gilad@codefidence.com> Sigend-off-by: Ori Finkelman <ori@comsleep.com> Sigend-off-by: Yony Amit <yony@comsleep.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-29Allow disabling TCP timestamp options per routeGilad Ben-Yossef
Implement querying and acting upon the no timestamp bit in the feature field. Signed-off-by: Gilad Ben-Yossef <gilad@codefidence.com> Sigend-off-by: Ori Finkelman <ori@comsleep.com> Sigend-off-by: Yony Amit <yony@comsleep.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-29Add the no SACK route option featureGilad Ben-Yossef
Implement querying and acting upon the no sack bit in the features field. Signed-off-by: Gilad Ben-Yossef <gilad@codefidence.com> Sigend-off-by: Ori Finkelman <ori@comsleep.com> Sigend-off-by: Yony Amit <yony@comsleep.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-29Allow tcp_parse_options to consult dst entryGilad Ben-Yossef
We need tcp_parse_options to be aware of dst_entry to take into account per dst_entry TCP options settings Signed-off-by: Gilad Ben-Yossef <gilad@codefidence.com> Sigend-off-by: Ori Finkelman <ori@comsleep.com> Sigend-off-by: Yony Amit <yony@comsleep.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-29Only parse time stamp TCP option in time wait sockGilad Ben-Yossef
Since we only use tcp_parse_options here to check for the exietence of TCP timestamp option in the header, it is better to call with the "established" flag on. Signed-off-by: Gilad Ben-Yossef <gilad@codefidence.com> Signed-off-by: Ori Finkelman <ori@comsleep.com> Signed-off-by: Yony Amit <yony@comsleep.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-29ip6mr: Optimize multiple unregistrationEric Dumazet
Speedup module unloading by factorizing synchronize_rcu() calls Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-29ipv6 sit: Optimize multiple unregistrationEric Dumazet
Speedup module unloading by factorizing synchronize_rcu() calls Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-29ipmr: Optimize multiple unregistrationEric Dumazet
Speedup module unloading by factorizing synchronize_rcu() calls Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-29ip6tnl: Optimize multiple unregistrationEric Dumazet
Speedup module unloading by factorizing synchronize_rcu() calls Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-29bridge: Optimize multiple unregistrationEric Dumazet
Speedup module unloading by factorizing synchronize_rcu() calls Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-29AF_RAW: Augment raw_send_hdrinc to expand skb to fit iphdr->ihl (v2)Neil Horman
Augment raw_send_hdrinc to correct for incorrect ip header length values A series of oopses was reported to me recently. Apparently when using AF_RAW sockets to send data to peers that were reachable via ipsec encapsulation, people could panic or BUG halt their systems. I've tracked the problem down to user space sending an invalid ip header over an AF_RAW socket with IP_HDRINCL set to 1. Basically what happens is that userspace sends down an ip frame that includes only the header (no data), but sets the ip header ihl value to a large number, one that is larger than the total amount of data passed to the sendmsg call. In raw_send_hdrincl, we allocate an skb based on the size of the data in the msghdr that was passed in, but assume the data is all valid. Later during ipsec encapsulation, xfrm4_tranport_output moves the entire frame back in the skbuff to provide headroom for the ipsec headers. During this operation, the skb->transport_header is repointed to a spot computed by skb->network_header + the ip header length (ihl). Since so little data was passed in relative to the value of ihl provided by the raw socket, we point transport header to an unknown location, resulting in various crashes. This fix for this is pretty straightforward, simply validate the value of of iph->ihl when sending over a raw socket. If (iph->ihl*4U) > user data buffer size, drop the frame and return -EINVAL. I just confirmed this fixes the reported crashes. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-29vlan: Add support to netdev_ops.ndo_fcoe_get_wwn for VLAN deviceYi Zou
Implements the netdev_ops.ndo_fcoe_get_wwn for VLAN device. Signed-off-by: Yi Zou <yi.zou@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-28net: Corrected spelling error heurestics->heuristicsAndreas Petlund
Corrected a spelling error in a function name. Signed-off-by: Andreas Petlund <apetlund@simula.no> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-28net: sysfs: ethtool_ops can be NULLEric Dumazet
commit d519e17e2d01a0ee9abe083019532061b4438065 (net: export device speed and duplex via sysfs) made the wrong assumption that netdev->ethtool_ops was always set. This makes possible to crash kernel and let rtnl in locked state. modprobe dummy ip link set dummy0 up (udev runs and crash) Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-28gre: Optimize multiple unregistrationEric Dumazet
Speedup module unloading by factorizing synchronize_rcu() calls Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-28ipip: Optimize multiple unregistrationEric Dumazet
Speedup module unloading by factorizing synchronize_rcu() calls Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-28vlan: Optimize multiple unregistrationEric Dumazet
Use unregister_netdevice_many() to speedup master device unregister. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-28net: add a list_head parameter to dellink() methodEric Dumazet
Adding a list_head parameter to rtnl_link_ops->dellink() methods allow us to queue devices on a list, in order to dismantle them all at once. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-28net: Introduce unregister_netdevice_many()Eric Dumazet
Introduce rollback_registered_many() and unregister_netdevice_many() rollback_registered_many() is able to perform necessary steps at device dismantle time, factorizing two expensive synchronize_net() calls. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>