aboutsummaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2008-07-02netdev: Add support for rx flow hash configuration, using ethtool.Santwona Behera
Added new interfaces to ethtool to configure receive network flow distribution across multiple rx rings using hashing. Signed-off-by: Santwona Behera <santwona.behera@sun.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-01sctp: Mark GET_PEER|LOCAL_ADDR_OLD deprecated.Vlad Yasevich
Socket options SCTP_GET_PEER_ADDR_OLD, SCTP_GET_PEER_ADDR_NUM_OLD, SCTP_GET_LOCAL_ADDR_OLD, and SCTP_GET_PEER_LOCAL_ADDR_NUM_OLD have been replaced by newer versions a since 2005. It's time to officially deprecate them and schedule them for removal. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-01icmp: fix units for ratelimitStephen Hemminger
Convert the sysctl values for icmp ratelimit to use milliseconds instead of jiffies which is based on kernel configured HZ. Internal kernel jiffies are not a proper unit for any userspace API. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-28Merge branch 'master' of ↵David S. Miller
master.kernel.org:/pub/scm/linux/kernel/git/linville/wireless-next-2.6
2008-06-28Merge branch 'master' of ↵David S. Miller
master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/wireless/iwlwifi/iwl4965-base.c
2008-06-27ipv6 route: Convert rt6_device_match() to use RT6_LOOKUP_F_xxx flags.YOSHIFUJI Hideaki
The commit 77d16f450ae0452d7d4b009f78debb1294fb435c ("[IPV6] ROUTE: Unify RT6_F_xxx and RT6_SELECT_F_xxx flags") intended to pass various routing lookup hints around RT6_LOOKUP_F_xxx flags, but conversion was missing for rt6_device_match(). Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-27netlabel: Fix a problem when dumping the default IPv6 static labelsPaul Moore
There is a missing "!" in a conditional statement which is causing entries to be skipped when dumping the default IPv6 static label entries. This can be demonstrated by running the following: # netlabelctl unlbl add default address:::1 \ label:system_u:object_r:unlabeled_t:s0 # netlabelctl -p unlbl list ... you will notice that the entry for the IPv6 localhost address is not displayed but does exist (works correctly, causes collisions when attempting to add duplicate entries, etc.). Signed-off-by: Paul Moore <paul.moore@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-27net/inet_lro: remove setting skb->ip_summed when not LRO-ableEli Cohen
When an SKB cannot be chained to a session, the current code attempts to "restore" its ip_summed field from lro_mgr->ip_summed. However, lro_mgr->ip_summed does not hold the original value; in fact, we'd better not touch skb->ip_summed since it is not modified by the code in the path leading to a failure to chain it. Also use a cleaer comment to the describe the ip_summed field of struct net_lro_mgr. Issue raised by Or Gerlitz <ogerlitz@voltaire.com> Signed-off-by: Eli Cohen <eli@mellanox.co.il> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-27inet fragments: fix race between inet_frag_find and inet_frag_secret_rebuildPavel Emelyanov
The problem is that while we work w/o the inet_frags.lock even read-locked the secret rebuild timer may occur (on another CPU, since BHs are still disabled in the inet_frag_find) and change the rnd seed for ipv4/6 fragments. It was caused by my patch fd9e63544cac30a34c951f0ec958038f0529e244 ([INET]: Omit double hash calculations in xxx_frag_intern) late in the 2.6.24 kernel, so this should probably be queued to -stable. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-27netlink: Fix some doc comments in net/netlink/attr.cJulius Volz
Fix some doc comments to match function and attribute names in net/netlink/attr.c. Signed-off-by: Julius Volz <juliusv@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-27tcp: /proc/net/tcp rto,ato values not scaled properly (v2)Stephen Hemminger
I found another case where we are sending information to userspace in the wrong HZ scale. This should have been fixed back in 2.5 :-( This means an ABI change but as it stands there is no way for an application like ss to get the right value. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-27pkt_sched: Remove CONFIG_NET_SCH_RRAdrian Bunk
Commit d62733c8e437fdb58325617c4b3331769ba82d70 ([SCHED]: Qdisc changes and sch_rr added for multiqueue) added a NET_SCH_RR option that was unused since the code went unconditionally into sch_prio. Reported-by: Robert P. J. Day <rpjday@crashcourse.ca> Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-27pkt_sched: ERR_PTR() ususally encodes an negative errno, not positive.WANG Cong
Note, in the following patch, 'err' is initialized as: int err = -ENOBUFS; Signed-off-by: WANG Cong <wcong@critical-links.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-27netdevice: Fix typo of dev_unicast_add() commentWang Chen
Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-27af_unix: fix 'poll for write'/connected DGRAM socketsRainer Weikusat
For n:1 'datagram connections' (eg /dev/log), the unix_dgram_sendmsg routine implements a form of receiver-imposed flow control by comparing the length of the receive queue of the 'peer socket' with the max_ack_backlog value stored in the corresponding sock structure, either blocking the thread which caused the send-routine to be called or returning EAGAIN. This routine is used by both SOCK_DGRAM and SOCK_SEQPACKET sockets. The poll-implementation for these socket types is datagram_poll from core/datagram.c. A socket is deemed to be writeable by this routine when the memory presently consumed by datagrams owned by it is less than the configured socket send buffer size. This is always wrong for PF_UNIX non-stream sockets connected to server sockets dealing with (potentially) multiple clients if the abovementioned receive queue is currently considered to be full. 'poll' will then return, indicating that the socket is writeable, but a subsequent write result in EAGAIN, effectively causing an (usual) application to 'poll for writeability by repeated send request with O_NONBLOCK set' until it has consumed its time quantum. The change below uses a suitably modified variant of the datagram_poll routines for both type of PF_UNIX sockets, which tests if the recv-queue of the peer a socket is connected to is presently considered to be 'full' as part of the 'is this socket writeable'-checking code. The socket being polled is additionally put onto the peer_wait wait queue associated with its peer, because the unix_dgram_recvmsg routine does a wake up on this queue after a datagram was received and the 'other wakeup call' is done implicitly as part of skb destruction, meaning, a process blocked in poll because of a full peer receive queue could otherwise sleep forever if no datagram owned by its socket was already sitting on this queue. Among this change is a small (inline) helper routine named 'unix_recvq_full', which consolidates the actual testing code (in three different places) into a single location. Signed-off-by: Rainer Weikusat <rweikusat@mssgmbh.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-27tcp: fix for splice receive when used with software LROOctavian Purdila
If an skb has nr_frags set to zero but its frag_list is not empty (as it can happen if software LRO is enabled), and a previous tcp_read_sock has consumed the linear part of the skb, then __skb_splice_bits: (a) incorrectly reports an error and (b) forgets to update the offset to account for the linear part Any of the two problems will cause the subsequent __skb_splice_bits call (the one that handles the frag_list skbs) to either skip data, or, if the unadjusted offset is greater then the size of the next skb in the frag_list, make tcp_splice_read loop forever. Signed-off-by: Octavian Purdila <opurdila@ixiacom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-27tcp: calculate tcp_mem based on low memory instead of all memoryMiquel van Smoorenburg
The tcp_mem array which contains limits on the total amount of memory used by TCP sockets is calculated based on nr_all_pages. On a 32 bits x86 system, we should base this on the number of lowmem pages. Signed-off-by: Miquel van Smoorenburg <miquels@cistron.nl> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-27mac80211: fix an oops in several failure paths in key allocationEmmanuel Grumbach
This patch fixes an oops in several failure paths in key allocation. This Oops occurs when freeing a key that has not been linked yet, so the key->sdata is not set. Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Tomas Winkler <tomas.winkler@intel.com> Acked-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-06-27mac80211: fix tx fragmentationJohannes Berg
This patch fixes TX fragmentation caused by tx handlers reordering and 'tx info to cb' patches Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Tomas Winkler <tomas.winkler@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-06-27mac80211: make workqueue freezableJohannes Berg
This patch makes the mac80211 workqueue freezable making it interact a bit better with system suspend and not try to ping the AP while the hardware is down. This doesn't really help with implementing proper suspend in any way but makes some bad things trigger less. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-06-27mac80211: add phy information to giwnameTomas Winkler
This patch add phy information to giwname. Quoting: It's not useless, it's supposed to tell you about the protocol capability of the device, like "IEEE 802.11b" or "IEEE 802.11abg" Jean Signed-off-by: Tomas Winkler <tomas.winkler@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-06-27mac80211: update the authentication methodEmmanuel Grumbach
This patch updates the authentication method upon giwencode ioctl. Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Tomas Winkler <tomas.winkler@intel.com> Acked-by: Johannes Berg <johannes@sipsolutions.net> Acked-by: Dan Williams <dcbw@redhat.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-06-27mac80211: don't return -EINVAL upon iwconfig wlan0 rts autoEmmanuel Grumbach
This patch avoids returning -EINVAL upon iwconfig wlan0 rts auto. If rts->fixed is 0, then we should choose a default value instead of failing. Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Tomas Winkler <tomas.winkler@intel.com> Acked-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-06-27mac80211: mlme.c use new frame control helpersHarvey Harrison
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-06-27mac80211: rx.c use new frame control helpersHarvey Harrison
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-06-27mac80211: tx.c use new frame control helpersHarvey Harrison
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-06-27mac80211: wep.c use new frame control helpersHarvey Harrison
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-06-27mac80211: Let drivers have access to TKIP key offets for TX and RX MICLuis R. Rodriguez
Some drivers may want to to use the TKIP key offsets for TX and RX MIC so lets move this out. Lets also clear up a bit how this is used internally in mac80211. Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-06-27mac80211: rename TKIP debugging Kconfig symbolJohannes Berg
... to MAC80211_TKIP_DEBUG rather than TKIP_DEBUG. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-06-26mac80211: add single function calling tx handlersJohannes Berg
This modifies mac80211 to only have a single function calling the TX handlers rather than them being invoked in multiple places. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-06-26mac80211: use separate spinlock for sta flagsJohannes Berg
David Ellingsworth posted a bug that was only noticable on UP/NO-PREEMPT and Michael correctly analysed it to be a spin_lock_bh() section within a spin_lock_irqsave() section. This adds a separate spinlock for the sta_info flags to fix that issue and avoid having to take much care about where the sta flag manipulation functions are called. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Reported-By: David Ellingsworth <david@identd.dyndns.org> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-06-26mac80211: remove shared key todoJohannes Berg
Adding shared key authentication is not going to happen anyway. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-06-26mac80211: 11h - Handling measurement requestAssaf Krauss
This patch handles the 11h measurement request information element. This is minimal requested implementation - refuse measurement. Signed-off-by: Assaf Krauss <assaf.krauss@intel.com> Signed-off-by: Tomas Winkler <tomas.winkler@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-06-26mac80211: 11h Infrastructure - ParsingAssaf Krauss
This patch introduces parsing of 11h and 11d related elements from incoming management frames. Signed-off-by: Assaf Krauss <assaf.krauss@intel.com> Signed-off-by: Tomas Winkler <tomas.winkler@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-06-26rfkill: rename the rfkill_state states and add block-locked stateHenrique de Moraes Holschuh
The current naming of rfkill_state causes a lot of confusion: not only the "kill" in rfkill suggests negative logic, but also the fact that rfkill cannot turn anything on (it can just force something off or stop forcing something off) is often forgotten. Rename RFKILL_STATE_OFF to RFKILL_STATE_SOFT_BLOCKED (transmitter is blocked and will not operate; state can be changed by a toggle_radio request), and RFKILL_STATE_ON to RFKILL_STATE_UNBLOCKED (transmitter is not blocked, and may operate). Also, add a new third state, RFKILL_STATE_HARD_BLOCKED (transmitter is blocked and will not operate; state cannot be changed through a toggle_radio request), which is used by drivers to indicate a wireless transmiter was blocked by a hardware rfkill line that accepts no overrides. Keep the old names as #defines, but document them as deprecated. This way, drivers can be converted to the new names *and* verified to actually use rfkill correctly one by one. Signed-off-by: Henrique de Moraes Holschuh <hmh@hmh.eng.br> Acked-by: Ivo van Doorn <IvDoorn@gmail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-06-26rfkill: do not allow userspace to override ALL RADIOS OFFHenrique de Moraes Holschuh
SW_RFKILL_ALL is the "emergency power-off all radios" input event. It must be handled, and must always do the same thing as far as the rfkill system is concerned: all transmitters are to go *immediately* offline. For safety, do NOT allow userspace to override EV_SW SW_RFKILL_ALL OFF. As long as rfkill-input is loaded, that event will *always* be processed, and it will *always* force all rfkill switches to disable all wireless transmitters, regardless of user_claim attribute or anything else. Signed-off-by: Henrique de Moraes Holschuh <hmh@hmh.eng.br> Acked-by: Ivo van Doorn <IvDoorn@gmail.com> Cc: Dmitry Torokhov <dtor@mail.ru> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-06-26rfkill: drop current_state from tasks in rfkill-inputFabien Crespel
The whole current_state thing seems completely useless and a source of problems in rfkill-input, since state comparison is already done in rfkill, and rfkill-input is more than likely to become out of sync with the real state. Signed-off-by: Fabien Crespel <fabien@crespel.net> Acked-by: Henrique de Moraes Holschuh <hmh@hmh.eng.br> Acked-by: Ivo van Doorn <IvDoorn@gmail.com> Cc: Dmitry Torokhov <dtor@mail.ru> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-06-26rfkill: add uevent notificationsHenrique de Moraes Holschuh
Use the notification chains to also send uevents, so that userspace can be notified of state changes of every rfkill switch. Userspace should use these events for OSD/status report applications and rfkill GUI frontends. HAL might want to broadcast them over DBUS, for example. It might be also useful for userspace implementations of rfkill-input, or to use HAL as the platform driver which promotes rfkill switch change events into input events (to synchronize all other switches) when necessary for platforms that lack a convenient platform-specific kernel module to do it. Signed-off-by: Henrique de Moraes Holschuh <hmh@hmh.eng.br> Acked-by: Ivo van Doorn <IvDoorn@gmail.com> Cc: Dmitry Torokhov <dtor@mail.ru> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-06-26rfkill: add type string helperHenrique de Moraes Holschuh
We will need access to the rfkill switch type in string format for more than just sysfs. Therefore, move it to a generic helper. Signed-off-by: Henrique de Moraes Holschuh <hmh@hmh.eng.br> Acked-by: Ivo van Doorn <IvDoorn@gmail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-06-26rfkill: add notifier chains supportHenrique de Moraes Holschuh
Add a notifier chain for use by the rfkill class. This notifier chain signals the following events (more to be added when needed): 1. rfkill: rfkill device state has changed A pointer to the rfkill struct will be passed as a parameter. The notifier message types have been added to include/linux/rfkill.h instead of to include/linux/notifier.h in order to avoid the madness of modifying a header used globally (and that triggers an almost full tree rebuild every time it is touched) with information that is of interest only to code that includes the rfkill.h header. Signed-off-by: Henrique de Moraes Holschuh <hmh@hmh.eng.br> Acked-by: Ivo van Doorn <IvDoorn@gmail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-06-26rfkill: rework suspend and resume handlersHenrique de Moraes Holschuh
The resume handler should reset the wireless transmitter rfkill state to exactly what it was when the system was suspended. Do it, and do it using the normal routines for state change while at it. The suspend handler should force-switch the transmitter to blocked state, ignoring caches. Do it. Also take an opportunity shot to rfkill_remove_switch() and also force the transmitter to blocked state there, bypassing caches. Signed-off-by: Henrique de Moraes Holschuh <hmh@hmh.eng.br> Acked-by: Ivo van Doorn <IvDoorn@gmail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-06-26rfkill: add the WWAN radio typeHenrique de Moraes Holschuh
Unfortunately, instead of adding a generic Wireless WAN type, a technology- specific type (WiMAX) was added. That's useless for other WWAN devices, such as EDGE, UMTS, X-RTT and other such radios. Add a WWAN rfkill type for generic wireless WAN devices. No keys are added as most devices really want to use KEY_WLAN for WWAN control (in a cycle of none, WLAN, WWAN, WLAN+WWAN) and need no specific keycode added. Signed-off-by: Henrique de Moraes Holschuh <hmh@hmh.eng.br> Acked-by: Ivo van Doorn <IvDoorn@gmail.com> Cc: Iñaky Pérez-González <inaky.perez-gonzalez@intel.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-06-26rfkill: add read-write rfkill switch supportHenrique de Moraes Holschuh
Currently, rfkill support for read/write rfkill switches is hacked through a round-trip over the input layer and rfkill-input to let a driver sync rfkill->state to hardware changes. This is buggy and sub-optimal. It causes real problems. It is best to think of the rfkill class as supporting only write-only switches at the moment. In order to implement the read/write functionality properly: Add a get_state() hook that is called by the class every time it needs to fetch the current state of the switch. Add a call to this hook every time the *current* state of the radio plays a role in a decision. Also add a force_state() method that can be used to forcefully syncronize the class' idea of the current state of the switch. This allows for a faster implementation of the read/write functionality, as a driver which get events on switch changes can avoid the need for a get_state() hook. If the get_state() hook is left as NULL, current behaviour is maintained, so this change is fully backwards compatible with the current rfkill drivers. For hardware that issues events when the rfkill state changes, leave get_state() NULL in the rfkill struct, set the initial state properly before registering with the rfkill class, and use the force_state() method in the driver to keep the rfkill interface up-to-date. get_state() can be called by the class from atomic context. It must not sleep. Signed-off-by: Henrique de Moraes Holschuh <hmh@hmh.eng.br> Acked-by: Ivo van Doorn <IvDoorn@gmail.com> Cc: Dmitry Torokhov <dtor@mail.ru> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-06-26rfkill: add parameter to disable radios by defaultHenrique de Moraes Holschuh
Currently, radios are always enabled when their rfkill interface is registered. This is not optimal, the safest state for a radio is to be offline unless the user turns it on. Add a module parameter that causes all radios to be disabled when their rfkill interface is registered. The module default is not changed so unless the parameter is used, radios will still be forced to their enabled state when they are registered. The new rfkill module parameter is called "default_state". Signed-off-by: Henrique de Moraes Holschuh <hmh@hmh.eng.br> Acked-by: Ivo van Doorn <IvDoorn@gmail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-06-26rfkill: handle SW_RFKILL_ALL eventsHenrique de Moraes Holschuh
Teach rfkill-input how to handle SW_RFKILL_ALL events (new name for the SW_RADIO event). SW_RFKILL_ALL is an absolute enable-or-disable command that is tied to all radios in a system. Signed-off-by: Henrique de Moraes Holschuh <hmh@hmh.eng.br> Acked-by: Ivo van Doorn <IvDoorn@gmail.com> Cc: Dmitry Torokhov <dtor@mail.ru> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-06-26rfkill: fix minor typo in kernel docHenrique de Moraes Holschuh
Fix a minor typo in an exported function documentation Signed-off-by: Henrique de Moraes Holschuh <hmh@hmh.eng.br> Acked-by: Ivo van Doorn <IvDoorn@gmail.com> Cc: Dmitry Torokhov <dtor@mail.ru> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-06-25Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/wireless-2.6John W. Linville
2008-06-25mac80211: implement EU regulatory domainTony Vroon
Implement missing EU regulatory domain for mac80211. Based on the information in IEEE 802.11-2007 (specifically pages 1142, 1143 & 1148) and ETSI 301 893 (V1.4.1). With thanks to Johannes Berg. Signed-off-by: Tony Vroon <tony@linx.net> Acked-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-06-24netfilter: ip6table_mangle: don't reroute in LOCAL_INPatrick McHardy
Rerouting should only happen in LOCAL_OUT, in INPUT its useless since the packet has already chosen its final destination. Noticed by Alexey Dobriyan <adobriyan@gmail.com>. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-20netns: Don't receive new packets in a dead network namespace.Eric W. Biederman
Alexey Dobriyan <adobriyan@gmail.com> writes: > Subject: ICMP sockets destruction vs ICMP packets oops > After icmp_sk_exit() nuked ICMP sockets, we get an interrupt. > icmp_reply() wants ICMP socket. > > Steps to reproduce: > > launch shell in new netns > move real NIC to netns > setup routing > ping -i 0 > exit from shell > > BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 > IP: [<ffffffff803fce17>] icmp_sk+0x17/0x30 > PGD 17f3cd067 PUD 17f3ce067 PMD 0 > Oops: 0000 [1] PREEMPT SMP DEBUG_PAGEALLOC > CPU 0 > Modules linked in: usblp usbcore > Pid: 0, comm: swapper Not tainted 2.6.26-rc6-netns-ct #4 > RIP: 0010:[<ffffffff803fce17>] [<ffffffff803fce17>] icmp_sk+0x17/0x30 > RSP: 0018:ffffffff8057fc30 EFLAGS: 00010286 > RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff81017c7db900 > RDX: 0000000000000034 RSI: ffff81017c7db900 RDI: ffff81017dc41800 > RBP: ffffffff8057fc40 R08: 0000000000000001 R09: 000000000000a815 > R10: 0000000000000000 R11: 0000000000000001 R12: ffffffff8057fd28 > R13: ffffffff8057fd00 R14: ffff81017c7db938 R15: ffff81017dc41800 > FS: 0000000000000000(0000) GS:ffffffff80525000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b > CR2: 0000000000000000 CR3: 000000017fcda000 CR4: 00000000000006e0 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > Process swapper (pid: 0, threadinfo ffffffff8053a000, task ffffffff804fa4a0) > Stack: 0000000000000000 ffff81017c7db900 ffffffff8057fcf0 ffffffff803fcfe4 > ffffffff804faa38 0000000000000246 0000000000005a40 0000000000000246 > 000000000001ffff ffff81017dd68dc0 0000000000005a40 0000000055342436 > Call Trace: > <IRQ> [<ffffffff803fcfe4>] icmp_reply+0x44/0x1e0 > [<ffffffff803d3a0a>] ? ip_route_input+0x23a/0x1360 > [<ffffffff803fd645>] icmp_echo+0x65/0x70 > [<ffffffff803fd300>] icmp_rcv+0x180/0x1b0 > [<ffffffff803d6d84>] ip_local_deliver+0xf4/0x1f0 > [<ffffffff803d71bb>] ip_rcv+0x33b/0x650 > [<ffffffff803bb16a>] netif_receive_skb+0x27a/0x340 > [<ffffffff803be57d>] process_backlog+0x9d/0x100 > [<ffffffff803bdd4d>] net_rx_action+0x18d/0x250 > [<ffffffff80237be5>] __do_softirq+0x75/0x100 > [<ffffffff8020c97c>] call_softirq+0x1c/0x30 > [<ffffffff8020f085>] do_softirq+0x65/0xa0 > [<ffffffff80237af7>] irq_exit+0x97/0xa0 > [<ffffffff8020f198>] do_IRQ+0xa8/0x130 > [<ffffffff80212ee0>] ? mwait_idle+0x0/0x60 > [<ffffffff8020bc46>] ret_from_intr+0x0/0xf > <EOI> [<ffffffff80212f2c>] ? mwait_idle+0x4c/0x60 > [<ffffffff80212f23>] ? mwait_idle+0x43/0x60 > [<ffffffff8020a217>] ? cpu_idle+0x57/0xa0 > [<ffffffff8040f380>] ? rest_init+0x70/0x80 > Code: 10 5b 41 5c 41 5d 41 5e c9 c3 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 53 > 48 83 ec 08 48 8b 9f 78 01 00 00 e8 2b c7 f1 ff 89 c0 <48> 8b 04 c3 48 83 c4 08 > 5b c9 c3 66 66 66 66 66 2e 0f 1f 84 00 > RIP [<ffffffff803fce17>] icmp_sk+0x17/0x30 > RSP <ffffffff8057fc30> > CR2: 0000000000000000 > ---[ end trace ea161157b76b33e8 ]--- > Kernel panic - not syncing: Aiee, killing interrupt handler! Receiving packets while we are cleaning up a network namespace is a racy proposition. It is possible when the packet arrives that we have removed some but not all of the state we need to fully process it. We have the choice of either playing wack-a-mole with the cleanup routines or simply dropping packets when we don't have a network namespace to handle them. Since the check looks inexpensive in netif_receive_skb let's just drop the incoming packets. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>