Kernel - My Linux kernel repository

Age	Commit message (Collapse)	Author
2009-06-22	netfilter: xt_rateest: fix comparison with self	Patrick McHardy
	As noticed by Török Edwin <edwintorok@gmail.com>: Compiling the kernel with clang has shown this warning: net/netfilter/xt_rateest.c:69:16: warning: self-comparison always results in a constant value ret &= pps2 == pps2; ^ Looking at the code: if (info->flags & XT_RATEEST_MATCH_BPS) ret &= bps1 == bps2; if (info->flags & XT_RATEEST_MATCH_PPS) ret &= pps2 == pps2; Judging from the MATCH_BPS case it seems to be a typo, with the intention of comparing pps1 with pps2. http://bugzilla.kernel.org/show_bug.cgi?id=13535 Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-06-22	netfilter: xt_quota: fix incomplete initialization	Jan Engelhardt
	Commit v2.6.29-rc5-872-gacc738f ("xtables: avoid pointer to self") forgot to copy the initial quota value supplied by iptables into the private structure, thus counting from whatever was in the memory kmalloc returned. Signed-off-by: Jan Engelhardt <jengelh@medozas.de> Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-06-22	netfilter: nf_log: fix direct userspace memory access in proc handler	Patrick McHardy
	Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-06-22	netfilter: fix some sparse endianess warnings	Patrick McHardy
	net/netfilter/xt_NFQUEUE.c:46:9: warning: incorrect type in assignment (different base types) net/netfilter/xt_NFQUEUE.c:46:9: expected unsigned int [unsigned] [usertype] ipaddr net/netfilter/xt_NFQUEUE.c:46:9: got restricted unsigned int net/netfilter/xt_NFQUEUE.c:68:10: warning: incorrect type in assignment (different base types) net/netfilter/xt_NFQUEUE.c:68:10: expected unsigned int [unsigned] <noident> net/netfilter/xt_NFQUEUE.c:68:10: got restricted unsigned int net/netfilter/xt_NFQUEUE.c:69:10: warning: incorrect type in assignment (different base types) net/netfilter/xt_NFQUEUE.c:69:10: expected unsigned int [unsigned] <noident> net/netfilter/xt_NFQUEUE.c:69:10: got restricted unsigned int net/netfilter/xt_NFQUEUE.c:70:10: warning: incorrect type in assignment (different base types) net/netfilter/xt_NFQUEUE.c:70:10: expected unsigned int [unsigned] <noident> net/netfilter/xt_NFQUEUE.c:70:10: got restricted unsigned int net/netfilter/xt_NFQUEUE.c:71:10: warning: incorrect type in assignment (different base types) net/netfilter/xt_NFQUEUE.c:71:10: expected unsigned int [unsigned] <noident> net/netfilter/xt_NFQUEUE.c:71:10: got restricted unsigned int net/netfilter/xt_cluster.c:20:55: warning: incorrect type in return expression (different base types) net/netfilter/xt_cluster.c:20:55: expected unsigned int net/netfilter/xt_cluster.c:20:55: got restricted unsigned int const [usertype] ip net/netfilter/xt_cluster.c:20:55: warning: incorrect type in return expression (different base types) net/netfilter/xt_cluster.c:20:55: expected unsigned int net/netfilter/xt_cluster.c:20:55: got restricted unsigned int const [usertype] ip Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-06-22	netfilter: nf_conntrack: fix conntrack lookup race	Patrick McHardy
	The RCU protected conntrack hash lookup only checks whether the entry has a refcount of zero to decide whether it is stale. This is not sufficient, entries are explicitly removed while there is at least one reference left, possibly more. Explicitly check whether the entry has been marked as dying to fix this. Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-06-22	netfilter: nf_conntrack: fix confirmation race condition	Patrick McHardy
	New connection tracking entries are inserted into the hash before they are fully set up, namely the CONFIRMED bit is not set and the timer not started yet. This can theoretically lead to a race with timer, which would set the timeout value to a relative value, most likely already in the past. Perform hash insertion as the final step to fix this. Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-06-22	netfilter: nf_conntrack: death_by_timeout() fix	Eric Dumazet
	death_by_timeout() might delete a conntrack from hash list and insert it in dying list. nf_ct_delete_from_lists(ct); nf_ct_insert_dying_list(ct); I believe a (lockless) reader could catch ct while doing a lookup and miss the end of its chain. (nulls lookup algo must check the null value at the end of lookup and should restart if the null value is not the expected one. cf Documentation/RCU/rculist_nulls.txt for details) We need to change nf_conntrack_init_net() and use a different "null" value, guaranteed not being used in regular lists. Choose very large values, since hash table uses [0..size-1] null values. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-06-20	Merge branch 'master' of â†µ	David S. Miller
	git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6
2009-06-20	ipv4: fix NULL pointer + success return in route lookup path	Neil Horman
	Don't drop route if we're not caching I recently got a report of an oops on a route lookup. Maxime was testing what would happen if route caching was turned off (doing so by setting making rt_caching always return 0), and found that it triggered an oops. I looked at it and found that the problem stemmed from the fact that the route lookup routines were returning success from their lookup paths (which is good), but never set the *rp pointer to anything (which is bad). This happens because in rt_intern_hash, if rt_caching returns false, we call rt_drop and return 0. This almost emulates slient success. What we should be doing is assigning rp = rt and _not_ dropping the route. This way, during slow path lookups, when we create a new route cache entry, we don't immediately discard it, rather we just don't add it into the cache hash table, but we let this one lookup use it for the purpose of this route request. Maxime has tested and reports it prevents the oops. There is still a subsequent routing issue that I'm looking into further, but I'm confident that, even if its related to this same path, this patch makes sense to take. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-19	cfg80211: validate station settings	Johannes Berg
	When I disallowed interfering with stations on non-AP interfaces, I not only forget mesh but also managed interfaces which need this for the authorized flag. Let's actually validate everything properly. This fixes an nl80211 regression introduced by the interfering, under which wpa_supplicant -Dnl80211 could not properly connect. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-06-19	cfg80211: allow setting station parameters in mesh	Andrey Yurovsky
	Mesh Point interfaces can also set parameters, for example plink_open is used to manually establish peer links from user-space (currently via iw). Add Mesh Point to the check in nl80211_set_station. Signed-off-by: Andrey Yurovsky <andrey@cozybit.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-06-19	cfg80211: allow adding/deleting stations on mesh	Andrey Yurovsky
	Commit b2a151a288 added a check that prevents adding or deleting stations on non-AP interfaces. Adding and deleting stations is supported for Mesh Point interfaces, so add Mesh Point to that check as well. Signed-off-by: Andrey Yurovsky <andrey@cozybit.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-06-19	rfkill: export persistent attribute in sysfs	Alan Jenkins
	This information allows userspace to implement a hybrid policy where it can store the rfkill soft-blocked state in platform non-volatile storage if available, and if not then file-based storage can be used. Some users prefer platform non-volatile storage because of the behaviour when dual-booting multiple versions of Linux, or if the rfkill setting is changed in the BIOS setting screens, or if the BIOS responds to wireless-toggle hotkeys itself before the relevant platform driver has been loaded. Signed-off-by: Alan Jenkins <alan-jenkins@tuffmail.co.uk> Acked-by: Henrique de Moraes Holschuh <hmh@hmh.eng.br> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-06-19	rfkill: don't restore software blocked state on persistent devices	Alan Jenkins
	The setting of the "persistent" flag is also made more explicit using a new rfkill_init_sw_state() function, instead of special-casing rfkill_set_sw_state() when it is called before registration. Suspend is a bit of a corner case so we try to get away without adding another hack to rfkill-input - it's going to be removed soon. If the state does change over suspend, users will simply have to prod rfkill-input twice in order to toggle the state. Userspace policy agents will be able to implement a more consistent user experience. For example, they can avoid the above problem if they toggle devices individually. Then there would be no "global state" to get out of sync. Currently there are only two rfkill drivers with persistent soft-blocked state. thinkpad-acpi already checks the software state on resume. eeepc-laptop will require modification. Signed-off-by: Alan Jenkins <alan-jenkins@tuffmail.co.uk> CC: Marcel Holtmann <marcel@holtmann.org> Acked-by: Henrique de Moraes Holschuh <hmh@hmh.eng.br> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-06-19	rfkill: rfkill_set_block() when suspended nitpick	Alan Jenkins
	If we return after fiddling with the state, userspace will see the wrong state and rfkill_set_sw_state() won't work until the next call to rfkill_set_block(). At the moment rfkill_set_block() will always be called from rfkill_resume(), but this will change in future. Also, presumably the point of this test is to avoid bothering devices which may be suspended. If we don't want to call set_block(), we probably don't want to call query() either :-). Signed-off-by: Alan Jenkins <alan-jenkins@tuffmail.co.uk> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-06-19	ieee802154: use standard routine for printing dumps	Dmitry Baryshkov
	Use print_hex_dump_bytes instead of self-written dumping function for outputting packet dumps. Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-19	af_iucv: Return -EAGAIN if iucv msg limit is exceeded	Hendrik Brueckner
	If the iucv message limit for a communication path is exceeded, sendmsg() returns -EAGAIN instead of -EPIPE. The calling application can then handle this error situtation, e.g. to try again after waiting some time. For blocking sockets, sendmsg() waits up to the socket timeout before returning -EAGAIN. For the new wait condition, a macro has been introduced and the iucv_sock_wait_state() has been refactored to this macro. Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-19	af_iucv: Change if condition in sendmsg() for more readability	Hendrik Brueckner
	Change the if condition to exit sendmsg() if the socket in not connected. Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-18	atm: sk_wmem_alloc initial value is one	Eric Dumazet
	commit 2b85a34e911bf483c27cfdd124aeb1605145dc80 (net: No more expensive sock_hold()/sock_put() on each tx) changed initial sk_wmem_alloc value. This broke net/atm since this protocol assumed a null initial value. This patch makes necessary changes. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-18	net: correct off-by-one write allocations reports	Eric Dumazet
	commit 2b85a34e911bf483c27cfdd124aeb1605145dc80 (net: No more expensive sock_hold()/sock_put() on each tx) changed initial sk_wmem_alloc value. We need to take into account this offset when reporting sk_wmem_alloc to user, in PROC_FS files or various ioctls (SIOCOUTQ/TIOCOUTQ) Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-18	net: group address list and its count	Jiri Pirko
	This patch is inspired by patch recently posted by Johannes Berg. Basically what my patch does is to group list and a count of addresses into newly introduced structure netdev_hw_addr_list. This brings us two benefits: 1) struct net_device becames a bit nicer. 2) in the future there will be a possibility to operate with lists independently on netdevices (with exporting right functions). I wanted to introduce this patch before I'll post a multicast lists conversion. Signed-off-by: Jiri Pirko <jpirko@redhat.com> drivers/net/bnx2.c \| 4 +- drivers/net/e1000/e1000_main.c \| 4 +- drivers/net/ixgbe/ixgbe_main.c \| 6 +- drivers/net/mv643xx_eth.c \| 2 +- drivers/net/niu.c \| 4 +- drivers/net/virtio_net.c \| 10 ++-- drivers/s390/net/qeth_l2_main.c \| 2 +- include/linux/netdevice.h \| 17 +++-- net/core/dev.c \| 130 ++++++++++++++++++-------------------- 9 files changed, 89 insertions(+), 90 deletions(-) Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-18	ipv4: Fix fib_trie rebalancing, part 2	Jarek Poplawski
	My previous patch, which explicitly delays freeing of tnodes by adding them to the list to flush them after the update is finished, isn't strict enough. It treats exceptionally tnodes without parent, assuming they are newly created, so "invisible" for the read side yet. But the top tnode doesn't have parent as well, so we have to exclude all exceptions (at least until a better way is found). Additionally we need to move rcu assignment of this node before flushing, so the return type of the trie_rebalance() function is changed. Reported-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-17	pkt_sched: Update drops stats in act_police	Jarek Poplawski
	Action police statistics could be misleading because drops are not shown when expected. With feedback from: Jamal Hadi Salim <hadi@cyberus.ca> Reported-by: Pawel Staszewski <pstaszewski@itcare.pl> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Acked-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-17	skbuff: don't corrupt mac_header on skb expansion	Stephen Hemminger
	The skb mac_header field is sometimes NULL (or ~0u) as a sentinel value. The places where skb is expanded add an offset which would change this flag into an invalid pointer (or offset). Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-17	skbuff: skb_mac_header_was_set is always true on >32 bit	Stephen Hemminger
	Looking at the crash in log_martians(), one suspect is that the check for mac header being set is not correct. The value of mac_header defaults to 0 on allocation, therefore skb_mac_header_was_set will always be true on platforms using NET_SKBUFF_USES_OFFSET. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-17	net: sk_wmem_alloc has initial value of one, not zero	Eric Dumazet
	commit 2b85a34e911bf483c27cfdd124aeb1605145dc80 (net: No more expensive sock_hold()/sock_put() on each tx) changed initial sk_wmem_alloc value. Some protocols check sk_wmem_alloc value to determine if a timer must delay socket deallocation. We must take care of the sk_wmem_alloc value being one instead of zero when no write allocations are pending. Reported by Ingo Molnar, and full diagnostic from David Miller. This patch introduces three helpers to get read/write allocations and a followup patch will use these helpers to report correct write allocations to user. Reported-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-16	x25: Fix sleep from timer on socket destroy.	David S. Miller
	If socket destuction gets delayed to a timer, we try to lock_sock() from that timer which won't work. Use bh_lock_sock() in that case. Signed-off-by: David S. Miller <davem@davemloft.net> Tested-by: Ingo Molnar <mingo@elte.hu>
2009-06-15	mac80211: fix wext bssid/ssid setting	Johannes Berg
	When changing to a new BSSID or SSID, the code in ieee80211_set_disassoc() needs to have the old data still valid to be able to disconnect and clean up properly. Currently, however, the old data is thrown away before ieee80211_set_disassoc() is ever called, so fix that by calling the function _before_ the old data is overwritten. This is (one of) the issue(s) causing mac80211 to hold cfg80211's BSS structs forever, and them thus being returned in scan results after they're long gone. http://www.intellinuxwireless.org/bugzilla/show_bug.cgi?id=2015 Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-06-15	mac80211: disconnect when user changes channel	Johannes Berg
	If we do not disconnect when a channel switch is requested, we end up eventually detection beacon loss from the AP and then disconnecting, without ever really telling the AP, so we might just as well disconnect right away. Additionally, this fixes a problem with iwlwifi where the driver will clear some internal state on channel changes like this and then get confused when we actually go clear that state from mac80211. It may look like this patch drops the no-IBSS check, but that is already handled by cfg80211 in the wext handler it provides for IBSS (cfg80211_ibss_wext_siwfreq). Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-06-15	mac80211: add queue debugfs file	Johannes Berg
	I suspect that some driver bugs can cause queues to be stopped while they shouldn't be, but it's hard to find out whether that is the case or not without having any visible information about the queues. This adds a file to debugfs that allows us to see the queues' statuses. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-06-15	mac80211: Do not try to associate with an empty SSID	Jouni Malinen
	It looks like some programs (e.g., NM) are setting an empty SSID with SIOCSIWESSID in some cases. This seems to trigger mac80211 to try to associate with an invalid configuration (wildcard SSID) which will result in failing associations (or odd issues, potentially including kernel panic with some drivers) if the AP were to actually accept this anyway). Only start association process if the SSID is actually set. This speeds up connection with NM in number of cases and avoids sending out broken association request frames. Signed-off-by: Jouni Malinen <jouni.malinen@atheros.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-06-15	Merge branch 'master' of â†µ	David S. Miller
	master.kernel.org:/pub/scm/linux/kernel/git/torvalds/linux-2.6 Conflicts: Documentation/feature-removal-schedule.txt drivers/scsi/fcoe/fcoe.c net/core/drop_monitor.c net/core/net-traces.c
2009-06-15	pkt_sched: Rename PSCHED_US2NS and PSCHED_NS2US	Jarek Poplawski
	Let's use TICKS instead of US, so PSCHED_TICKS2NS and PSCHED_NS2TICKS (like in PSCHED_TICKS_PER_SEC already) to avoid misleading. Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-15	ipv4: Fix fib_trie rebalancing	Jarek Poplawski
	While doing trie_rebalance(): resize(), inflate(), halve() RCU free tnodes before updating their parents. It depends on RCU delaying the real destruction, but if RCU readers start after call_rcu() and before parent update they could access freed memory. It is currently prevented with preempt_disable() on the update side, but it's not safe, except maybe classic RCU, plus it conflicts with memory allocations with GFP_KERNEL flag used from these functions. This patch explicitly delays freeing of tnodes by adding them to the list, which is flushed after the update is finished. Reported-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-14	Merge branch 'for-linus' of â†µ	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (31 commits) trivial: remove the trivial patch monkey's name from SubmittingPatches trivial: Fix a typo in comment of addrconf_dad_start() trivial: usb: fix missing space typo in doc trivial: pci hotplug: adding __init/__exit macros to sgi_hotplug trivial: Remove the hyphen from git commands trivial: fix ETIMEOUT -> ETIMEDOUT typos trivial: Kconfig: .ko is normally not included in module names trivial: SubmittingPatches: fix typo trivial: Documentation/dell_rbu.txt: fix typos trivial: Fix Pavel's address in MAINTAINERS trivial: ftrace:fix description of trace directory trivial: unnecessary (void*) cast removal in sound/oss/msnd.c trivial: input/misc: Fix typo in Kconfig trivial: fix grammo in bus_for_each_dev() kerneldoc trivial: rbtree.txt: fix rb_entry() parameters in sample code trivial: spelling fix in ppc code comments trivial: fix typo in bio_alloc kernel doc trivial: Documentation/rbtree.txt: cleanup kerneldoc of rbtree.txt trivial: Miscellaneous documentation typo fixes trivial: fix typo milisecond/millisecond for documentation and source comments. ...
2009-06-14	Bluetooth: Fix Kconfig issue with RFKILL integration	Marcel Holtmann
	Since the re-write of the RFKILL subsystem it is no longer good to just select RFKILL, but it is important to add a proper depends on rule. Based on a report by Alexander Beregalov <a.beregalov@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2009-06-14	PIM-SM: namespace changes	Tom Goff
	IPv4: - make PIM register vifs netns local - set the netns when a PIM register vif is created - make PIM available in all network namespaces (if CONFIG_IP_PIMSM_V2) by adding the protocol handler when multicast routing is initialized IPv6: - make PIM register vifs netns local - make PIM available in all network namespaces (if CONFIG_IPV6_PIMSM_V2) by adding the protocol handler when multicast routing is initialized Signed-off-by: Tom Goff <thomas.goff@boeing.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-13	ipv4: update ARPD help text	Timo TerÃ¤s
	Removed the statements about ARP cache size as this config option does not affect it. The cache size is controlled by neigh_table gc thresholds. Remove also expiremental and obsolete markings as the API originally intended for arp caching is useful for implementing ARP-like protocols (e.g. NHRP) in user space and has been there for a long enough time. Signed-off-by: Timo Teras <timo.teras@iki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-13	net: use a deferred timer in rt_check_expire	Eric Dumazet
	For the sake of power saver lovers, use a deferrable timer to fire rt_check_expire() As some big routers cache equilibrium depends on garbage collection done in time, we take into account elapsed time between two rt_check_expire() invocations to adjust the amount of slots we have to check. Based on an initial idea and patch from Tero Kristo Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Tero Kristo <tero.kristo@nokia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-13	Merge branch 'master' of â†µ	David S. Miller
	git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-next-2.6
2009-06-13	x_tables: Convert printk to pr_err	Joe Perches
	Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-06-13	netfilter: conntrack: optional reliable conntrack event delivery	Pablo Neira Ayuso
	This patch improves ctnetlink event reliability if one broadcast listener has set the NETLINK_BROADCAST_ERROR socket option. The logic is the following: if an event delivery fails, we keep the undelivered events in the missed event cache. Once the next packet arrives, we add the new events (if any) to the missed events in the cache and we try a new delivery, and so on. Thus, if ctnetlink fails to deliver an event, we try to deliver them once we see a new packet. Therefore, we may lose state transitions but the userspace process gets in sync at some point. At worst case, if no events were delivered to userspace, we make sure that destroy events are successfully delivered. Basically, if ctnetlink fails to deliver the destroy event, we remove the conntrack entry from the hashes and we insert them in the dying list, which contains inactive entries. Then, the conntrack timer is added with an extra grace timeout of random32() % 15 seconds to trigger the event again (this grace timeout is tunable via /proc). The use of a limited random timeout value allows distributing the "destroy" resends, thus, avoiding accumulating lots "destroy" events at the same time. Event delivery may re-order but we can identify them by means of the tuple plus the conntrack ID. The maximum number of conntrack entries (active or inactive) is still handled by nf_conntrack_max. Thus, we may start dropping packets at some point if we accumulate a lot of inactive conntrack entries that did not successfully report the destroy event to userspace. During my stress tests consisting of setting a very small buffer of 2048 bytes for conntrackd and the NETLINK_BROADCAST_ERROR socket flag, and generating lots of very small connections, I noticed very few destroy entries on the fly waiting to be resend. A simple way to test this patch consist of creating a lot of entries, set a very small Netlink buffer in conntrackd (+ a patch which is not in the git tree to set the BROADCAST_ERROR flag) and invoke `conntrack -F'. For expectations, no changes are introduced in this patch. Currently, event delivery is only done for new expectations (no events from expectation expiration, removal and confirmation). In that case, they need a per-expectation event cache to implement the same idea that is exposed in this patch. This patch can be useful to provide reliable flow-accouting. We still have to add a new conntrack extension to store the creation and destroy time. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-06-13	netfilter: conntrack: move helper destruction to nf_ct_helper_destroy()	Pablo Neira Ayuso
	This patch moves the helper destruction to a function that lives in nf_conntrack_helper.c. This new function is used in the patch to add ctnetlink reliable event delivery. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-06-13	netfilter: conntrack: move event caching to conntrack extension infrastructure	Pablo Neira Ayuso
	This patch reworks the per-cpu event caching to use the conntrack extension infrastructure. The main drawback is that we consume more memory per conntrack if event delivery is enabled. This patch is required by the reliable event delivery that follows to this patch. BTW, this patch allows you to enable/disable event delivery via /proc/sys/net/netfilter/nf_conntrack_events in runtime, although you can still disable event caching as compilation option. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-06-13	netfilter: nf_conntrack: use mod_timer_pending() for conntrack refresh	Patrick McHardy
	Use mod_timer_pending() instead of atomic sequence of del_timer()/ add_timer(). mod_timer_pending() does not rearm an inactive timer, so we don't need the conntrack lock anymore to make sure we don't accidentally rearm a timer of a conntrack which is in the process of being destroyed. With this change, we don't need to take the global lock anymore at all, counter updates can be performed under the per-conntrack lock. Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-06-13	netfilter: nf_log: fix sleeping function called from invalid context	Patrick McHardy
	Fix regression introduced by 17625274 "netfilter: sysctl support of logger choice": BUG: sleeping function called from invalid context at /mnt/s390test/linux-2.6-tip/arch/s390/include/asm/uaccess.h:234 in_atomic(): 1, irqs_disabled(): 0, pid: 3245, name: sysctl CPU: 1 Not tainted 2.6.30-rc8-tipjun10-02053-g39ae214 #1 Process sysctl (pid: 3245, task: 000000007f675da0, ksp: 000000007eb17cf0) 0000000000000000 000000007eb17be8 0000000000000002 0000000000000000 000000007eb17c88 000000007eb17c00 000000007eb17c00 0000000000048156 00000000003e2de8 000000007f676118 000000007eb17f10 0000000000000000 0000000000000000 000000007eb17be8 000000000000000d 000000007eb17c58 00000000003e2050 000000000001635c 000000007eb17be8 000000007eb17c30 Call Trace: (Ý<00000000000162e6>¨ show_trace+0x13a/0x148) Ý<00000000000349ea>¨ __might_sleep+0x13a/0x164 Ý<0000000000050300>¨ proc_dostring+0x134/0x22c Ý<0000000000312b70>¨ nf_log_proc_dostring+0xfc/0x188 Ý<0000000000136f5e>¨ proc_sys_call_handler+0xf6/0x118 Ý<0000000000136fda>¨ proc_sys_read+0x26/0x34 Ý<00000000000d6e9c>¨ vfs_read+0xac/0x158 Ý<00000000000d703e>¨ SyS_read+0x56/0x88 Ý<0000000000027f42>¨ sysc_noemu+0x10/0x16 Use the nf_log_mutex instead of RCU to fix this. Reported-and-tested-by: Maran Pakkirisamy <maranpsamy@in.ibm.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-06-13	net: use symbolic values for ndo_start_xmit() return codes	Patrick McHardy
	Convert magic values 1 and -1 to NETDEV_TX_BUSY and NETDEV_TX_LOCKED respectively. 0 (NETDEV_TX_OK) is not changed to keep the noise down, except in very few cases where its in direct proximity to one of the other values. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-13	net: fix network drivers ndo_start_xmit() return values (part 7)	Patrick McHardy
	Fix up ATM drivers that return an errno value to qdisc_restart(), causing qdisc_restart() to print a warning an requeue/retransmit the skb. - lec: condition can only be remedied by userspace, until that retransmissions Compile tested only. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-12	trivial: Fix a typo in comment of addrconf_dad_start()	Masatake YAMATO
	Signed-off-by: Masatake YAMATO <yamato@redhat.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2009-06-12	trivial: Kconfig: .ko is normally not included in module names	Pavel Machek
	.ko is normally not included in Kconfig help, make it consistent. Signed-off-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Jiri Kosina <jkosina@suse.cz>