aboutsummaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2009-03-16netfilter: xtables: add cluster matchPablo Neira Ayuso
This patch adds the iptables cluster match. This match can be used to deploy gateway and back-end load-sharing clusters. The cluster can be composed of 32 nodes maximum (although I have only tested this with two nodes, so I cannot tell what is the real scalability limit of this solution in terms of cluster nodes). Assuming that all the nodes see all packets (see below for an example on how to do that if your switch does not allow this), the cluster match decides if this node has to handle a packet given: (jhash(source IP) % total_nodes) & node_mask For related connections, the master conntrack is used. The following is an example of its use to deploy a gateway cluster composed of two nodes (where this is the node 1): iptables -I PREROUTING -t mangle -i eth1 -m cluster \ --cluster-total-nodes 2 --cluster-local-node 1 \ --cluster-proc-name eth1 -j MARK --set-mark 0xffff iptables -A PREROUTING -t mangle -i eth1 \ -m mark ! --mark 0xffff -j DROP iptables -A PREROUTING -t mangle -i eth2 -m cluster \ --cluster-total-nodes 2 --cluster-local-node 1 \ --cluster-proc-name eth2 -j MARK --set-mark 0xffff iptables -A PREROUTING -t mangle -i eth2 \ -m mark ! --mark 0xffff -j DROP And the following commands to make all nodes see the same packets: ip maddr add 01:00:5e:00:01:01 dev eth1 ip maddr add 01:00:5e:00:01:02 dev eth2 arptables -I OUTPUT -o eth1 --h-length 6 \ -j mangle --mangle-mac-s 01:00:5e:00:01:01 arptables -I INPUT -i eth1 --h-length 6 \ --destination-mac 01:00:5e:00:01:01 \ -j mangle --mangle-mac-d 00:zz:yy:xx:5a:27 arptables -I OUTPUT -o eth2 --h-length 6 \ -j mangle --mangle-mac-s 01:00:5e:00:01:02 arptables -I INPUT -i eth2 --h-length 6 \ --destination-mac 01:00:5e:00:01:02 \ -j mangle --mangle-mac-d 00:zz:yy:xx:5a:27 In the case of TCP connections, pickup facility has to be disabled to avoid marking TCP ACK packets coming in the reply direction as valid. echo 0 > /proc/sys/net/netfilter/nf_conntrack_tcp_loose BTW, some final notes: * This match mangles the skbuff pkt_type in case that it detects PACKET_MULTICAST for a non-multicast address. This may be done in a PKTTYPE target for this sole purpose. * This match supersedes the CLUSTERIP target. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-03-16net: netfilter conntrack - add per-net functionality for DCCP protocolCyrill Gorcunov
Module specific data moved into per-net site and being allocated/freed during net namespace creation/deletion. Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Acked-by: Daniel Lezcano <daniel.lezcano@free.fr> Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-03-16net: sysctl_net - use net_eq to compare netsCyrill Gorcunov
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Acked-by: Daniel Lezcano <daniel.lezcano@free.fr> Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-03-16netfilter: xtables: avoid pointer to selfJan Engelhardt
Commit 784544739a25c30637397ace5489eeb6e15d7d49 (netfilter: iptables: lock free counters) broke a number of modules whose rule data referenced itself. A reallocation would not reestablish the correct references, so it is best to use a separate struct that does not fall under RCU. Signed-off-by: Jan Engelhardt <jengelh@medozas.de> Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-03-16netfilter: auto-load ip_queue module when socket openedScott James Remnant
The ip_queue module is missing the net-pf-16-proto-3 alias that would causae it to be auto-loaded when a socket of that type is opened. This patch adds the alias. Signed-off-by: Scott James Remnant <scott@canonical.com> Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-03-16netfilter: auto-load ip6_queue module when socket openedScott James Remnant
The ip6_queue module is missing the net-pf-16-proto-13 alias that would cause it to be auto-loaded when a socket of that type is opened. This patch adds the alias. Signed-off-by: Scott James Remnant <scott@canonical.com> Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-03-16netfilter: ctnetlink: move event reporting for new entries outside the lockPablo Neira Ayuso
This patch moves the event reporting outside the lock section. With this patch, the creation and update of entries is homogeneous from the event reporting perspective. Moreover, as the event reporting is done outside the lock section, the netlink broadcast delivery can benefit of the yield() call under congestion. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-03-16netfilter: ctnetlink: cleanup conntrack update preliminary checkingsPablo Neira Ayuso
This patch moves the preliminary checkings that must be fulfilled to update a conntrack, which are the following: * NAT manglings cannot be updated * Changing the master conntrack is not allowed. This patch is a cleanup. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-03-16netfilter: ctnetlink: cleanup master conntrack assignationPablo Neira Ayuso
This patch moves the assignation of the master conntrack to ctnetlink_create_conntrack(), which is where it really belongs. This patch is a cleanup. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-03-16netfilter: conntrack: increase drop stats if sequence adjustment failsPablo Neira Ayuso
This patch increases the statistics of packets drop if the sequence adjustment fails in ipv4_confirm(). Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-03-16netfilter: Kconfig spelling fixes (trivial)Stephen Hemminger
Signed-off-by: Stephen Hemminger <sheminger@vyatta.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-03-16netfilter: remove IPvX specific parts from nf_conntrack_l4proto.hChristoph Paasch
Moving the structure definitions to the corresponding IPvX specific header files. Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-03-16netfilter: print the list of register loggersEric Leblond
This patch modifies the proc output to add display of registered loggers. The content of /proc/net/netfilter/nf_log is modified. Instead of displaying a protocol per line with format: proto:logger it now displays: proto:logger (comma_separated_list_of_loggers) NONE is used as keyword if no logger is used. Signed-off-by: Eric Leblond <eric@inl.fr> Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-03-16netfilter: use a linked list of loggersEric Leblond
This patch modifies nf_log to use a linked list of loggers for each protocol. This list of loggers is read and write protected with a mutex. This patch separates registration and binding. To be used as logging module, a module has to register calling nf_log_register() and to bind to a protocol it has to call nf_log_bind_pf(). This patch also converts the logging modules to the new API. For nfnetlink_log, it simply switchs call to register functions to call to bind function and adds a call to nf_log_register() during init. For other modules, it just remove a const flag from the logger structure and replace it with a __read_mostly. Signed-off-by: Eric Leblond <eric@inl.fr> Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-02-24netfilter: xt_hashlimit fixEric Dumazet
Commit 784544739a25c30637397ace5489eeb6e15d7d49 (netfilter: iptables: lock free counters) broke xt_hashlimit netfilter module : This module was storing a pointer inside its xt_hashlimit_info, and this pointer is not relocated when we temporarly switch tables (iptables -L). This hack is not not needed at all (probably a leftover from ancient time), as each cpu should and can access to its own copy. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-02-24netfilter: nf_conntrack: account packets drop by tcp_packet()Pablo Neira Ayuso
Since tcp_packet() may return -NF_DROP in two situations, the packet-drop stats must be increased. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-02-20netfilter: ip_tables: unfold two critical loops in ip_packet_match()Eric Dumazet
While doing oprofile tests I noticed two loops are not properly unrolled by gcc Using a hand coded unrolled loop provides nice speedup : ipt_do_table credited of 2.52 % of cpu instead of 3.29 % in tbench. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-02-20netfilter: x_tables: add LED trigger targetAdam Nielsen
Kernel module providing implementation of LED netfilter target. Each instance of the target appears as a led-trigger device, which can be associated with one or more LEDs in /sys/class/leds/ Signed-off-by: Adam Nielsen <a.nielsen@shikadi.net> Acked-by: Richard Purdie <rpurdie@linux.intel.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-02-20netfilter: fix hardcoded size assumptionsHagen Paul Pfeifer
get_random_bytes() is sometimes called with a hard coded size assumption of an integer. This could not be true for next centuries. This patch replace it with a compile time statement. Signed-off-by: Hagen Paul Pfeifer <hagen@jauu.net> Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-02-20netfilter: nf_conntrack: table max size should hold at least table sizeHagen Paul Pfeifer
Table size is defined as unsigned, wheres the table maximum size is defined as a signed integer. The calculation of max is 8 or 4, multiplied the table size. Therefore the max value is aligned to unsigned. Signed-off-by: Hagen Paul Pfeifer <hagen@jauu.net> Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-02-20netfilter: iptables: lock free countersStephen Hemminger
The reader/writer lock in ip_tables is acquired in the critical path of processing packets and is one of the reasons just loading iptables can cause a 20% performance loss. The rwlock serves two functions: 1) it prevents changes to table state (xt_replace) while table is in use. This is now handled by doing rcu on the xt_table. When table is replaced, the new table(s) are put in and the old one table(s) are freed after RCU period. 2) it provides synchronization when accesing the counter values. This is now handled by swapping in new table_info entries for each cpu then summing the old values, and putting the result back onto one cpu. On a busy system it may cause sampling to occur at different times on each cpu, but no packet/byte counts are lost in the process. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Sucessfully tested on my dual quad core machine too, but iptables only (no ipv6 here) BTW, my new "tbench 8" result is 2450 MB/s, (it was 2150 MB/s not so long ago) Acked-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-02-19netfilter: ip6_tables: unfold two loops in ip6_packet_match()Eric Dumazet
ip6_tables netfilter module can use an ifname_compare() helper so that two loops are unfolded. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-02-19netfilter: xt_physdev: unfold two loops in physdev_mt()Eric Dumazet
xt_physdev netfilter module can use an ifname_compare() helper so that two loops are unfolded. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-02-19netfilter: xtables: add backward-compat optionsJan Engelhardt
Concern has been expressed about the changing Kconfig options. Provide the old options that forward-select. Signed-off-by: Jan Engelhardt <jengelh@medozas.de> Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-02-18netfilter: xt_physdev fixesEric Dumazet
1) physdev_mt() incorrectly assumes nulldevname[] is aligned on an int 2) It also uses word comparisons, while it could use long word ones. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-02-18netfilter: Combine ipt_ttl and ip6t_hl sourceJan Engelhardt
Signed-off-by: Jan Engelhardt <jengelh@medozas.de> Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-02-18netfilter: Combine ipt_TTL and ip6t_HL sourceJan Engelhardt
Suggested by: James King <t.james.king@gmail.com> Similarly to commit c9fd49680954714473d6cbd2546d6ff120f96840, merge TTL and HL. Since HL does not depend on any IPv6-specific function, no new module dependencies would arise. With slight adjustments to the Kconfig help text. Signed-off-by: Jan Engelhardt <jengelh@medozas.de> Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-02-18netfilter: arp_tables: unfold two critical loops in arp_packet_match()Eric Dumazet
x86 and powerpc can perform long word accesses in an efficient maner. We can use this to unroll two loops in arp_packet_match(), to perform arithmetic on long words instead of bytes. This is a win on x86_64 for example. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-02-18netfilter: log invalid new icmpv6 packet with nf_log_packet()Eric Leblond
This patch adds a logging message for invalid new icmpv6 packet. Signed-off-by: Eric Leblond <eric@inl.fr> Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-02-18netfilter: ebtables: remove unneeded initializationsStephen Hemminger
The initialization of the lock element is not needed since the lock is always initialized in ebt_register_table. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-02-18netfilter: x_tables: remove unneeded initializationsStephen Hemminger
Later patches change the locking on xt_table and the initialization of the lock element is not needed since the lock is always initialized in xt_table_register anyway. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-02-18netfilter: remove unneeded gotoJan Engelhardt
Signed-off-by: Jan Engelhardt <jengelh@medozas.de> Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-02-18netfilter: change generic l4 protocol numberChristoph Paasch
0 is used by Hop-by-hop header and so this may cause confusion. 255 is stated as 'Reserved' by IANA. Signed-off-by: Christoph Paasch <christoph.paasch@student.uclouvain.be> Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-02-16net: replace commatas with semicolonsThomas Gleixner
Impact: syntax fix Interestingly enough this compiles w/o any complaints: orphans = percpu_counter_sum_positive(&tcp_orphan_count), sockets = percpu_counter_sum_positive(&tcp_sockets_allocated), Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-16sctp: Inherit all socket options from parent correctly.Vlad Yasevich
During peeloff/accept() sctp needs to save the parent socket state into the new socket so that any options set on the parent are inherited by the child socket. This was found when the parent/listener socket issues SO_BINDTODEVICE, but the data was misrouted after a route cache flush. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-16sctp: Fix the RTO-doubling on idle-link heartbeatsVlad Yasevich
SCTP incorrectly doubles rto ever time a Hearbeat chunk is generated. However RFC 4960 states: On an idle destination address that is allowed to heartbeat, it is recommended that a HEARTBEAT chunk is sent once per RTO of that destination address plus the protocol parameter 'HB.interval', with jittering of +/- 50% of the RTO value, and exponential backoff of the RTO if the previous HEARTBEAT is unanswered. Essentially, of if the heartbean is unacknowledged, do we double the RTO. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-16sctp: Clean up sctp checksumming codeVlad Yasevich
The sctp crc32c checksum is always generated in little endian. So, we clean up the code to treat it as little endian and remove all the __force casts. Suggested by Herbert Xu. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-16sctp: Allow to disable SCTP checksums via module parameterLucas Nussbaum
This is a new version of my patch, now using a module parameter instead of a sysctl, so that the option is harder to find. Please note that, once the module is loaded, it is still possible to change the value of the parameter in /sys/module/sctp/parameters/, which is useful if you want to do performance comparisons without rebooting. Computation of SCTP checksums significantly affects the performance of SCTP. For example, using two dual-Opteron 246 connected using a Gbe network, it was not possible to achieve more than ~730 Mbps, compared to 941 Mbps after disabling SCTP checksums. Unfortunately, SCTP checksum offloading in NICs is not commonly available (yet). By default, checksums are still enabled, of course. Signed-off-by: Lucas Nussbaum <lucas.nussbaum@ens-lyon.fr> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-15net: pass new SIOCSHWTSTAMP through to device driversPatrick Ohly
Signed-off-by: Patrick Ohly <patrick.ohly@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-15ip: support for TX timestamps on UDP and RAW socketsPatrick Ohly
Instructions for time stamping outgoing packets are take from the socket layer and later copied into the new skb. Signed-off-by: Patrick Ohly <patrick.ohly@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-15net: socket infrastructure for SO_TIMESTAMPINGPatrick Ohly
The overlap with the old SO_TIMESTAMP[NS] options is handled so that time stamping in software (net_enable_timestamp()) is enabled when SO_TIMESTAMP[NS] and/or SO_TIMESTAMPING_RX_SOFTWARE is set. It's disabled if all of these are off. Signed-off-by: Patrick Ohly <patrick.ohly@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-15net: infrastructure for hardware time stampingPatrick Ohly
The additional per-packet information (16 bytes for time stamps, 1 byte for flags) is stored for all packets in the skb_shared_info struct. This implementation detail is hidden from users of that information via skb_* accessor functions. A separate struct resp. union is used for the additional information so that it can be stored/copied easily outside of skb_shared_info. Compared to previous implementations (reusing the tstamp field depending on the context, optional additional structures) this is the simplest solution. It does not extend sk_buff itself. TX time stamping is implemented in software if the device driver doesn't support hardware time stamping. The new semantic for hardware/software time stamping around ndo_start_xmit() is based on two assumptions about existing network device drivers which don't support hardware time stamping and know nothing about it: - they leave the new skb_shared_tx unmodified - the keep the connection to the originating socket in skb->sk alive, i.e., don't call skb_orphan() Given that skb_shared_tx is new, the first assumption is safe. The second is only true for some drivers. As a result, software TX time stamping currently works with the bnx2 driver, but not with the unmodified igb driver (the two drivers this patch series was tested with). Signed-off-by: Patrick Ohly <patrick.ohly@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-14Merge branch 'master' of /home/davem/src/GIT/linux-2.6/David S. Miller
Conflicts: drivers/net/wireless/iwlwifi/iwl-agn.c drivers/net/wireless/iwlwifi/iwl3945-base.c
2009-02-14Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6
2009-02-13mac80211: split managed/ibss code a little moreJohannes Berg
It appears that you can completely mess up mac80211 in IBSS mode by sending it a disassoc or deauth: it'll stop queues and do a lot more but not ever do anything again. Fix this by not handling all those frames in IBSS mode, Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-02-13mac80211: fix IBSS authJohannes Berg
The code beyond this point is supposed to be used for non-IBSS (managed) mode only. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Cc: Jouni Malinen <j@w1.fi> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-02-13mac80211: calculate wstats_flags on the flyJohannes Berg
Just to make wext.c more self-contained. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-02-13mac80211: use cfg80211s BSS infrastructureJohannes Berg
Remove all the code from mac80211 to keep track of BSSes and use the cfg80211-provided code completely. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-02-13cfg80211: add more flexible BSS lookupJohannes Berg
Add a more flexible BSS lookup function so that mac80211 or other drivers can actually use this for getting the BSS to connect to. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-02-13cfg80211: allow users to request removing a BSSJohannes Berg
This patch introduces cfg80211_unlink_bss, a function to allow a driver to remove a BSS from the internal list and make it not show up in scan results any more -- this is to be used when the driver detects that the BSS is no longer available. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>