aboutsummaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2009-05-04Bluetooth: Fix issue with sysfs handling for connectionsMarcel Holtmann
Due to a semantic changes in flush_workqueue() the current approach of synchronizing the sysfs handling for connections doesn't work anymore. The whole approach is actually fully broken and based on assumptions that are no longer valid. With the introduction of Simple Pairing support, the creation of low-level ACL links got changed. This change invalidates the reason why in the past two independent work queues have been used for adding/removing sysfs devices. The adding of the actual sysfs device is now postponed until the host controller successfully assigns an unique handle to that link. So the real synchronization happens inside the controller and not the host. The only left-over problem is that some internals of the sysfs device handling are not initialized ahead of time. This leaves potential access to invalid data and can cause various NULL pointer dereferences. To fix this a new function makes sure that all sysfs details are initialized when an connection attempt is made. The actual sysfs device is only registered when the connection has been successfully established. To avoid a race condition with the registration, the check if a device is registered has been moved into the removal work. As an extra protection two flush_work() calls are left in place to make sure a previous add/del work has been completed first. Based on a report by Marc Pignat <marc.pignat@hevs.ch> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Tested-by: Justin P. Mattock <justinmattock@gmail.com> Tested-by: Roger Quadros <ext-roger.quadros@nokia.com> Tested-by: Marc Pignat <marc.pignat@hevs.ch>
2009-05-04mac80211: pid, fix memory corruptionJiri Slaby
pid doesn't count with some band having more bitrates than the one associated the first time. Fix that by counting the maximal available bitrate count and allocate big enough space. Secondly, fix touching uninitialized memory which causes panics. Index sucked from this random memory points to the hell. The fix is to sort the rates on each band change. Signed-off-by: Jiri Slaby <jirislaby@gmail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-05-04mac80211: minstrel, fix memory corruptionJiri Slaby
minstrel doesn't count max rate count in fact, since it doesn't use a loop variable `i' and hence allocs space only for bitrates found in the first band. Fix it by involving the `i' as an index so that it traverses all the bands now and finds the real max bitrate count. Signed-off-by: Jiri Slaby <jirislaby@gmail.com> Cc: Felix Fietkau <nbd@openwrt.org> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-05-04cfg80211: fix comment on regulatory hint processingLuis R. Rodriguez
Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-05-04cfg80211: fix bug while trying to process beacon hints on initLuis R. Rodriguez
During initialization we would not have received any beacons so skip processing reg beacon hints, also adds a check to reg_is_world_roaming() for last_request before accessing its fields. This should fix this: BUG: unable to handle kernel NULL pointer dereference at IP: [<e0171332>] wiphy_update_regulatory+0x20f/0x295 *pdpt = 0000000008bf1001 *pde = 0000000000000000 Oops: 0000 [#1] last sysfs file: /sys/class/backlight/eeepc/brightness Modules linked in: ath5k(+) mac80211 led_class cfg80211 go_bit cfbcopyarea cfbimgblt cfbfillrect ipv6 ydev usual_tables(P) snd_hda_codec_realtek snd_hda_intel nd_hwdep uhci_hcd snd_pcm_oss snd_mixer_oss i2c_i801 e serio_raw i2c_core pcspkr atl2 snd_pcm intel_agp re agpgart eeepc_laptop snd_page_alloc ac video backlight rfkill button processor evdev thermal fan ata_generic Pid: 2909, comm: modprobe Tainted: Pc #112) 701 EIP: 0060:[<e0171332>] EFLAGS: 00010246 CPU: 0 EIP is at wiphy_update_regulatory+0x20f/0x295 [cfg80211] EAX: 00000000 EBX: c5da0000 ECX: 00000000 EDX: c5da0060 ESI: 0000001a EDI: c5da0060 EBP: df3bdd70 ESP: df3bdd40 DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068 Process modprobe (pid: 2909, ti=df3bc000 task=c5d030000) Stack: df3bdd90 c5da0060 c04277e0 00000001 00000044 c04277e402 00000002 c5da0000 0000001a c5da0060 df3bdda8 e01706a2 02 00000282 000080d0 00000068 c5d53500 00000080 0000028240 Call Trace: [<e01706a2>] ? wiphy_register+0x122/0x1b7 [cfg80211] [<e0328e02>] ? ieee80211_register_hw+0xd8/0x346 [<e06a7c9f>] ? ath5k_hw_set_bssid_mask+0x71/0x78 [ath5k] [<e06b0c52>] ? ath5k_pci_probe+0xa5c/0xd0a [ath5k] [<c01a6037>] ? sysfs_find_dirent+0x16/0x27 [<c01fec95>] ? local_pci_probe+0xe/0x10 [<c01ff526>] ? pci_device_probe+0x48/0x66 [<c024c9fd>] ? driver_probe_device+0x7f/0xf2 [<c024cab3>] ? __driver_attach+0x43/0x5f [<c024c0af>] ? bus_for_each_dev+0x39/0x5a [<c024c8d0>] ? driver_attach+0x14/0x16 [<c024ca70>] ? __driver_attach+0x0/0x5f [<c024c5b3>] ? bus_add_driver+0xd7/0x1e7 [<c024ccb9>] ? driver_register+0x7b/0xd7 [<c01ff827>] ? __pci_register_driver+0x32/0x85 [<e00a8018>] ? init_ath5k_pci+0x18/0x30 [ath5k] [<c0101131>] ? _stext+0x49/0x10b [<e00a8000>] ? init_ath5k_pci+0x0/0x30 [ath5k] [<c012f452>] ? __blocking_notifier_call_chain+0x40/0x4c [<c013a714>] ? sys_init_module+0x87/0x18b [<c0102804>] ? sysenter_do_call+0x12/0x22 Code: b8 da 17 e0 83 c0 04 e8 92 f9 ff ff 84 c0 75 2a 8b 85 c0 74 0c 83 c0 04 e8 7c f9 ff ff 84 c0 75 14 a1 bc da 4 03 74 66 8b 4d d4 80 79 08 00 74 5d a1 e0 d2 17 e0 48 EIP: [<e0171332>] wiphy_update_regulatory+0x20f/0x295 SP 0068:df3bdd40 CR2: 0000000000000004 ---[ end trace 830f2dd2a95fd1a8 ]--- This issue is hard to reproduce, but it was noticed and discussed on this thread: http://marc.info/?t=123938022700005&r=1&w=2 Cc: stable@kernel.org Reported-by: Alan Jenkins <alan-jenkins@tuffmail.co.uk> Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-05-04cfg80211: fix race condition with wiphy_apply_custom_regulatory()Luis R. Rodriguez
We forgot to lock using the cfg80211_mutex in wiphy_apply_custom_regulatory(). Without the lock there is possible race between processing a reply from CRDA and a driver calling wiphy_apply_custom_regulatory(). During the processing of the reply from CRDA we free last_request and wiphy_apply_custom_regulatory() eventually accesses an element from last_request in the through freq_reg_info_regd(). This is very difficult to reproduce (I haven't), it takes us 3 hours and you need to be banging hard, but the race is obvious by looking at the code. This should only affect those who use this caller, which currently is ath5k, ath9k, and ar9170. EIP: 0060:[<f8ebec50>] EFLAGS: 00210282 CPU: 1 EIP is at freq_reg_info_regd+0x24/0x121 [cfg80211] EAX: 00000000 EBX: f7ca0060 ECX: f5183d94 EDX: 0024cde0 ESI: f8f56edc EDI: 00000000 EBP: 00000000 ESP: f5183d44 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 Process modprobe (pid: 14617, ti=f5182000 task=f3934d10 task.ti=f5182000) Stack: c0505300 f7ca0ab4 f5183d94 0024cde0 f8f403a6 f8f63160 f7ca0060 00000000 00000000 f8ebedf8 f5183d90 f8f56edc 00000000 00000004 00000f40 f8f56edc f7ca0060 f7ca1234 00000000 00000000 00000000 f7ca14f0 f7ca0ab4 f7ca1289 Call Trace: [<f8ebedf8>] wiphy_apply_custom_regulatory+0x8f/0x122 [cfg80211] [<f8f3f798>] ath_attach+0x707/0x9e6 [ath9k] [<f8f45e46>] ath_pci_probe+0x18d/0x29a [ath9k] [<c023c7ba>] pci_device_probe+0xa3/0xe4 [<c02a860b>] really_probe+0xd7/0x1de [<c02a87e7>] __driver_attach+0x37/0x55 [<c02a7eed>] bus_for_each_dev+0x31/0x57 [<c02a83bd>] driver_attach+0x16/0x18 [<c02a78e6>] bus_add_driver+0xec/0x21b [<c02a8959>] driver_register+0x85/0xe2 [<c023c9bb>] __pci_register_driver+0x3c/0x69 [<f8e93043>] ath9k_init+0x43/0x68 [ath9k] [<c010112b>] _stext+0x3b/0x116 [<c014a872>] sys_init_module+0x8a/0x19e [<c01049ad>] sysenter_do_call+0x12/0x21 [<ffffe430>] 0xffffe430 ======================= Code: 0f 94 c0 c3 31 c0 c3 55 57 56 53 89 c3 83 ec 14 8b 74 24 2c 89 54 24 0c 89 4c 24 08 85 f6 75 06 8b 35 c8 bb ec f8 a1 cc bb ec f8 <8b> 40 04 83 f8 03 74 3a 48 74 37 8b 43 28 85 c0 74 30 89 c6 8b EIP: [<f8ebec50>] freq_reg_info_regd+0x24/0x121 [cfg80211] SS:ESP 0068:f5183d44 Cc: stable@kernel.org Reported-by: Nataraj Sadasivam <Nataraj.Sadasivam@Atheros.com> Reported-by: Vivek Natarajan <Vivek.Natarajan@Atheros.com> Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-05-04cfg80211: fix truncated IEsJohannes Berg
Another bug in the "cfg80211: do not replace BSS structs" patch, a forgotten length update leads to bogus data being stored and passed to userspace, often truncated. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-05-04mac80211: correct fragmentation threshold checkJohannes Berg
The fragmentation threshold is defined to be including the FCS, and the code that sets the TX_FRAGMENTED flag correctly accounts for those four bytes. The code that verifies this doesn't though, which could lead to spurious warnings and frames being dropped although everything is ok. Correct the code by accounting for the FCS. (JWL -- The problem is described here: http://article.gmane.org/gmane.linux.kernel.wireless.general/32205 ) Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-05-04netns 2/2: extract net_create()Alexey Dobriyan
net_create() will be used by C/R to create fresh netns on restart. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-04netns 1/2: don't get/put old netns on CLONE_NEWNETAlexey Dobriyan
copy_net_ns() doesn't copy anything, it creates fresh netns, so get/put of old netns isn't needed. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-04tcp: Fix tcp_prequeue() to get correct rto_min valueSatoru SATOH
tcp_prequeue() refers to the constant value (TCP_RTO_MIN) regardless of the actual value might be tuned. The following patches fix this and make tcp_prequeue get the actual value returns from tcp_rto_min(). Signed-off-by: Satoru SATOH <satoru.satoh@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-04tcp: extend ECN sysctl to allow server-side only ECNIlpo Järvinen
This should be very safe compared with full enabled, so I see no reason why it shouldn't be done right away. As ECN can only be negotiated if the SYN sending party is also supporting it, somebody in the loop probably knows what he/she is doing. If SYN does not ask for ECN, the server side SYN-ACK is identical to what it is without ECN. Thus it's quite safe. The chosen value is safe w.r.t to existing configs which choose to currently set manually either 0 or 1 but silently upgrades those who have not explicitly requested ECN off. Whether to just enable both sides comes up time to time but unless that gets done now we can at least make the servers aware of ECN already. As there are some known problems to occur if ECN is enabled, it's currently questionable whether there's any real gain from enabling clients as servers mostly won't support it anyway (so we'd hit just the negative sides). After enabling the servers and getting that deployed, the client end enable really has some potential gain too. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-03net: Avoid modulus in skb_tx_hash() for forwarding case.David S. Miller
Based almost entirely upon a patch by Eric Dumazet. The common case is to have num-tx-queues <= num_rx_queues and even if num_tx_queues is larger it will not be significantly larger. Therefore, a subtraction loop is always going to be faster than modulus. Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-03Merge branch 'master' of ↵David S. Miller
master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
2009-05-03Merge branch 'master' of ↵David S. Miller
master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
2009-05-02Subject: [PATCH] br2684: restore net_dev initializationRabin Vincent
Commit 0ba25ff4c669e5395110ba6ab4958a97a9f96922 ("br2684: convert to net_device_ops") inadvertently deleted the initialization of the net_dev pointer in the br2684_dev structure, leading to crashes. This patch adds it back. Reported-by: Mikko Vinni <mmvinni@yahoo.com> Tested-by: Mikko Vinni <mmvinni@yahoo.com> Signed-off-by: Rabin Vincent <rabin@rab.in> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-02net: Only store high 16 bits of kernel generated filter prioritiesRobert Love
The kernel should only be using the high 16 bits of a kernel generated priority. Filter priorities in all other cases only use the upper 16 bits of the u32 'prio' field of 'struct tcf_proto', but when the kernel generates the priority of a filter is saves all 32 bits which can result in incorrect lookup failures when a filter needs to be deleted or modified. Signed-off-by: Robert Love <robert.w.love@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-01xt_socket: checks for the state of nf_conntrackLaszlo Attila Toth
xt_socket can use connection tracking, and checks whether it is a module. Signed-off-by: Laszlo Attila Toth <panther@balabit.hu> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-01net: Fix skb_tx_hash() for forwarding workloads.Eric Dumazet
When skb_rx_queue_recorded() is true, we dont want to use jash distribution as the device driver exactly told us which queue was selected at RX time. jhash makes a statistical shuffle, but this wont work with 8 static inputs. Later improvements would be to compute reciprocal value of real_num_tx_queues to avoid a divide here. But this computation should be done once, when real_num_tx_queues is set. This needs a separate patch, and a new field in struct net_device. Reported-by: Andrew Dickinson <andrew@whydna.net> Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-04-30net: Fix oops when splicing skbs from a frag_list.Jarek Poplawski
Lennert Buytenhek wrote: > Since 4fb669948116d928ae44262ab7743732c574630d ("net: Optimize memory > usage when splicing from sockets.") I'm seeing this oops (e.g. in > 2.6.30-rc3) when splicing from a TCP socket to /dev/null on a driver > (mv643xx_eth) that uses LRO in the skb mode (lro_receive_skb) rather > than the frag mode: My patch incorrectly assumed skb->sk was always valid, but for "frag_listed" skbs we can only use skb->sk of their parent. Reported-by: Lennert Buytenhek <buytenh@wantstofly.org> Debugged-by: Lennert Buytenhek <buytenh@wantstofly.org> Tested-by: Lennert Buytenhek <buytenh@wantstofly.org> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-04-29Merge branch 'master' of ↵David S. Miller
master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: Documentation/isdn/00-INDEX drivers/net/wireless/iwlwifi/iwl-scan.c drivers/net/wireless/rndis_wlan.c net/mac80211/main.c
2009-04-29Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6
2009-04-29mac80211: default to automatic power controlJohannes Berg
In "mac80211: correct wext transmit power handler" I fixed the wext handler, but forgot to make the default of the user_power_level -1 (aka "auto"), so that now the transmit power is always set to 0, causing associations to time out and similar problems since we're transmitting with very little power. Correct this by correcting the default user_power_level to -1. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Bisected-by: Niel Lambrechts <niel.lambrechts@gmail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-04-29mac80211: fix modprobe deadlock by not calling wep_init under rtnl_lockAlan Jenkins
- ieee80211_wep_init(), which is called with rtnl_lock held, blocks in request_module() [waiting for modprobe to load a crypto module]. - modprobe blocks in a call to flush_workqueue(), when it closes a TTY [presumably when it exits]. - The workqueue item linkwatch_event() blocks on rtnl_lock. There's no reason for wep_init() to be called with rtnl_lock held, so just move it outside the critical section. Signed-off-by: Alan Jenkins <alan-jenkins@tuffmail.co.uk> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-04-28Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/holtmann/bluetooth-2.6
2009-04-28netfilter: revised locking for x_tablesStephen Hemminger
The x_tables are organized with a table structure and a per-cpu copies of the counters and rules. On older kernels there was a reader/writer lock per table which was a performance bottleneck. In 2.6.30-rc, this was converted to use RCU and the counters/rules which solved the performance problems for do_table but made replacing rules much slower because of the necessary RCU grace period. This version uses a per-cpu set of spinlocks and counters to allow to table processing to proceed without the cache thrashing of a global reader lock and keeps the same performance for table updates. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-04-28Bluetooth: Fix connection establishment with low security requirementMarcel Holtmann
The Bluetooth 2.1 specification introduced four different security modes that can be mapped using Legacy Pairing and Simple Pairing. With the usage of Simple Pairing it is required that all connections (except the ones for SDP) are encrypted. So even the low security requirement mandates an encrypted connection when using Simple Pairing. When using Legacy Pairing (for Bluetooth 2.0 devices and older) this is not required since it causes interoperability issues. To support this properly the low security requirement translates into different host controller transactions depending if Simple Pairing is supported or not. However in case of Simple Pairing the command to switch on encryption after a successful authentication is not triggered for the low security mode. This patch fixes this and actually makes the logic to differentiate between Simple Pairing and Legacy Pairing a lot simpler. Based on a report by Ville Tervo <ville.tervo@nokia.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2009-04-28Bluetooth: Add different pairing timeout for Legacy PairingMarcel Holtmann
The Bluetooth stack uses a reference counting for all established ACL links and if no user (L2CAP connection) is present, the link will be terminated to save power. The problem part is the dedicated pairing when using Legacy Pairing (Bluetooth 2.0 and before). At that point no user is present and pairing attempts will be disconnected within 10 seconds or less. In previous kernel version this was not a problem since the disconnect timeout wasn't triggered on incoming connections for the first time. However this caused issues with broken host stacks that kept the connections around after dedicated pairing. When the support for Simple Pairing got added, the link establishment procedure needed to be changed and now causes issues when using Legacy Pairing When using Simple Pairing it is possible to do a proper reference counting of ACL link users. With Legacy Pairing this is not possible since the specification is unclear in some areas and too many broken Bluetooth devices have already been deployed. So instead of trying to deal with all the broken devices, a special pairing timeout will be introduced that increases the timeout to 60 seconds when pairing is triggered. If a broken devices now puts the stack into an unforeseen state, the worst that happens is the disconnect timeout triggers after 120 seconds instead of 4 seconds. This allows successful pairings with legacy and broken devices now. Based on a report by Johan Hedberg <johan.hedberg@nokia.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2009-04-28Bluetooth: Ensure that HCI sysfs add/del is preempt safeRoger Quadros
Use a different work_struct variables for add_conn() and del_conn() and use single work queue instead of two for adding and deleting connections. It eliminates the following error on a preemptible kernel: [ 204.358032] Unable to handle kernel NULL pointer dereference at virtual address 0000000c [ 204.370697] pgd = c0004000 [ 204.373443] [0000000c] *pgd=00000000 [ 204.378601] Internal error: Oops: 17 [#1] PREEMPT [ 204.383361] Modules linked in: vfat fat rfcomm sco l2cap sd_mod scsi_mod iphb pvr2d drm omaplfb ps [ 204.438537] CPU: 0 Not tainted (2.6.28-maemo2 #1) [ 204.443664] PC is at klist_put+0x2c/0xb4 [ 204.447601] LR is at klist_put+0x18/0xb4 [ 204.451568] pc : [<c0270f08>] lr : [<c0270ef4>] psr: a0000113 [ 204.451568] sp : cf1b3f10 ip : cf1b3f10 fp : cf1b3f2c [ 204.463104] r10: 00000000 r9 : 00000000 r8 : bf08029c [ 204.468353] r7 : c7869200 r6 : cfbe2690 r5 : c78692c8 r4 : 00000001 [ 204.474945] r3 : 00000001 r2 : cf1b2000 r1 : 00000001 r0 : 00000000 [ 204.481506] Flags: NzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment kernel [ 204.488861] Control: 10c5387d Table: 887fc018 DAC: 00000017 [ 204.494628] Process btdelconn (pid: 515, stack limit = 0xcf1b22e0) Signed-off-by: Roger Quadros <ext-roger.quadros@nokia.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2009-04-28inet_diag: Remove dup assignmentsArnaldo Carvalho de Melo
These are later assigned to other values without being used meanwhile. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-04-28net: Avoid extra wakeups of threads blocked in wait_for_packet()Eric Dumazet
In 2.6.25 we added UDP mem accounting. This unfortunatly added a penalty when a frame is transmitted, since we have at TX completion time to call sock_wfree() to perform necessary memory accounting. This calls sock_def_write_space() and utimately scheduler if any thread is waiting on the socket. Thread(s) waiting for an incoming frame was scheduled, then had to sleep again as event was meaningless. (All threads waiting on a socket are using same sk_sleep anchor) This adds lot of extra wakeups and increases latencies, as noted by Christoph Lameter, and slows down softirq handler. Reference : http://marc.info/?l=linux-netdev&m=124060437012283&w=2 Fortunatly, Davide Libenzi recently added concept of keyed wakeups into kernel, and particularly for sockets (see commit 37e5540b3c9d838eb20f2ca8ea2eb8072271e403 epoll keyed wakeups: make sockets use keyed wakeups) Davide goal was to optimize epoll, but this new wakeup infrastructure can help non epoll users as well, if they care to setup an appropriate handler. This patch introduces new DEFINE_WAIT_FUNC() helper and uses it in wait_for_packet(), so that only relevant event can wakeup a thread blocked in this function. Trace of function calls from bnx2 TX completion bnx2_poll_work() is : __kfree_skb() skb_release_head_state() sock_wfree() sock_def_write_space() __wake_up_sync_key() __wake_up_common() receiver_wake_function() : Stops here since thread is waiting for an INPUT Reported-by: Christoph Lameter <cl@linux.com> Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-04-28sctp: add feature bit for SCTP offload in hardwareJesse Brandeburg
this is the sctp code to enable hardware crc32c offload for adapters that support it. Originally by: Vlad Yasevich <vladislav.yasevich@hp.com> modified by Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-04-27gro: Fix COMPLETE checksum handlingHerbert Xu
On a brand new GRO skb, we cannot call ip_hdr since the header may lie in the non-linear area. This patch adds the helper skb_gro_network_header to handle this. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-04-27gro: Fix handling of headers that extend over the tailHerbert Xu
The skb_gro_* code fails to handle the case where a header starts in the linear area but ends in the frags area. Since the goal of skb_gro_* is to optimise the case of completely non-linear packets, we can simply bail out if we have anything in the linear area. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-04-27ipv4: Limit size of route cache hash tableAnton Blanchard
Right now we have no upper limit on the size of the route cache hash table. On a 128GB POWER6 box it ends up as 32MB: IP route cache hash table entries: 4194304 (order: 9, 33554432 bytes) It would be nice to cap this for memory consumption reasons, but a massive hashtable also causes a significant spike when measuring OS jitter. With a 32MB hashtable and 4 million entries, rt_worker_func is taking 5 ms to complete. On another system with more memory it's taking 14 ms. Even though rt_worker_func does call cond_sched() to limit its impact, in an HPC environment we want to keep all sources of OS jitter to a minimum. With the patch applied we limit the number of entries to 512k which can still be overriden by using the rt_entries boot option: IP route cache hash table entries: 524288 (order: 6, 4194304 bytes) With this patch rt_worker_func now takes 0.460 ms on the same system. Signed-off-by: Anton Blanchard <anton@samba.org> Acked-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-04-27drop_monitor: Update netlink protocol to include netlink attribute header in ↵Neil Horman
alert message When I initially implemented this protocol, I disregarded the use of netlink attribute headers, thinking for my purposes they weren't needed. I've come to find out that, as I'm starting to work with sending down messages with associated data (like config messages), the kernel code spits out warnings about trailing data in a netlink skb that doesn't have an associated header on it. As such, I'm going to start including attribute headers in my netlink transaction, and so for completeness, I should likely include them on messages bound from the kernel to user space. This patch adds that header to the kernel, and bumps the protocol version accordingly Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-04-27xfrm: wrong hash value for temporary SANicolas Dichtel
When kernel inserts a temporary SA for IKE, it uses the wrong hash value for dst list. Two hash values were calcultated before: one with source address and one with a wildcard source address. Bug hinted by Junwei Zhang <junwei.zhang@6wind.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-04-27snmp: add missing counters for RFC 4293Neil Horman
The IP MIB (RFC 4293) defines stats for InOctets, OutOctets, InMcastOctets and OutMcastOctets: http://tools.ietf.org/html/rfc4293 But it seems we don't track those in any way that easy to separate from other protocols. This patch adds those missing counters to the stats file. Tested successfully by me With help from Eric Dumazet. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-04-25vlan: update vlan carrier state for admin up/downJay Vosburgh
Currently, the VLAN event handler does not adjust the VLAN device's carrier state when the real device or the VLAN device is set administratively up or down. The following patch adds a transfer of operating state from the real device to the VLAN device when the real device is administratively set up or down, and sets the carrier state up or down during init, open and close of the VLAN device. This permits observers above the VLAN device that care about the carrier state (bonding's link monitor, for example) to receive updates for administrative changes by more closely mimicing the behavior of real devices. Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-04-25Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-2.6
2009-04-25Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 Conflicts: net/mac80211/pm.c
2009-04-24wireless: remove some (bogus?) 'may be used uninitialized' warningsJohn W. Linville
net/mac80211/tx.c: In function ‘ieee80211_tx_h_select_key’: net/mac80211/tx.c:448: warning: ‘key’ may be used uninitialized in this function drivers/net/wireless/ath/ath9k/rc.c: In function ‘ath_rc_rate_getidx’: drivers/net/wireless/ath/ath9k/rc.c:815: warning: ‘nextindex’ may be used uninitialized in this function drivers/net/wireless/hostap/hostap_plx.c: In function ‘prism2_plx_probe’: drivers/net/wireless/hostap/hostap_plx.c:438: warning: ‘cor_index’ may be used uninitialized in this function drivers/net/wireless/hostap/hostap_plx.c:438: warning: ‘cor_offset’ may be used uninitialized in this function Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-04-24netfilter: xt_recent: fix stack overread in compat codeJan Engelhardt
Related-to: commit 325fb5b4d26038cba665dd0d8ee09555321061f0 The compat path suffers from a similar problem. It only uses a __be32 when all of the recent code uses, and expects, an nf_inet_addr everywhere. As a result, addresses stored by xt_recents were filled with whatever other stuff was on the stack following the be32. Signed-off-by: Jan Engelhardt <jengelh@medozas.de> With a minor compile fix from Roman. Reported-and-tested-by: Roman Hoog Antink <rha@open.ch> Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-04-24netfilter: nf_ct_dccp: add missing role attributes for DCCPPablo Neira Ayuso
This patch adds missing role attribute to the DCCP type, otherwise the creation of entries is not of any use. The attribute added is CTA_PROTOINFO_DCCP_ROLE which contains the role of the conntrack original tuple. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-04-24netfilter: Kconfig: TProxy doesn't depend on NF_CONNTRACKLaszlo Attila Toth
Signed-off-by: Laszlo Attila Toth <panther@balabit.hu> Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-04-24netfilter: nf_ct_dccp/udplite: fix protocol registration errorPatrick McHardy
Commit d0dba725 (netfilter: ctnetlink: add callbacks to the per-proto nlattrs) changed the protocol registration function to abort if the to-be registered protocol doesn't provide a new callback function. The DCCP and UDP-Lite IPv6 protocols were missed in this conversion, add the required callback pointer. Reported-and-tested-by: Steven Jan Springl <steven@springl.ukfsn.org> Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-04-23af_iucv: Fix merge.Ursula Braun
From: Ursula Braun <ubraun@linux.vnet.ibm.com> net/iucv/af_iucv.c in net-next-2.6 is almost correct. 4 lines should still be deleted. These are the remaining changes: Signed-off-by: David S. Miller <davem@davemloft.net>
2009-04-23Merge branch 'master' of ↵David S. Miller
master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: net/iucv/af_iucv.c
2009-04-23af_iucv: New socket option for setting IUCV MSGLIMITsHendrik Brueckner
The SO_MSGLIMIT socket option modifies the message limit for new IUCV communication paths. The message limit specifies the maximum number of outstanding messages that are allowed for connections. This setting can be lowered by z/VM when an IUCV connection is established. Expects an integer value in the range of 1 to 65535. The default value is 65535. The message limit must be set before calling connect() or listen() for sockets. If sockets are already connected or in state listen, changing the message limit is not supported. For reading the message limit value, unconnected sockets return the limit that has been set or the default limit. For connected sockets, the actual message limit is returned. The actual message limit is assigned by z/VM for each connection and it depends on IUCV MSGLIMIT authorizations specified for the z/VM guest virtual machine. Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-04-23af_iucv: cleanup and refactor recvmsg() EFAULT handlingHendrik Brueckner
If the skb cannot be copied to user iovec, always return -EFAULT. The skb is enqueued again, except MSG_PEEK flag is set, to allow user space applications to correct its iovec pointer. Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>