aboutsummaryrefslogtreecommitdiff
path: root/drivers/infiniband/ulp/ipoib
AgeCommit message (Collapse)Author
2008-01-25IPoIB: Constify seq_operations function pointer tablesJan Engelhardt
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25IPoIB: Remove redundant check of netif_queue_stopped() in xmit handlerKrishna Kumar
qdisc_run() now tests for queue_stopped() before calling __qdisc_run(), and the same check is done in every iteration of __qdisc_run(), so another check is not required in the driver xmit. This means that ipoib_start_xmit() no longer needs to test netif_queue_stopped(); the test was added to fix earlier kernels, where the networking stack did not guarantee that the xmit method of an LLTX driver would not be called after the queue was stopped, but current kernels do provide this guarantee. To validate, I put a debug in the TX_BUSY path which never hit with 64 threads running overnight exercising this code a few 100 million times. Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25IPoIB/CM: Enable SRQ support on HCAs that support fewer than 16 SG entriesPradeep Satyanarayana
Some HCAs (such as ehca2) support SRQ, but only support fewer than 16 SG entries for SRQs. Currently IPoIB/CM implicitly assumes all HCAs will support 16 SG entries for SRQs (to handle a 64K MTU with 4K pages). This patch removes that restriction by limiting the maximum MTU in connected mode to what the maximum number of SRQ SG entries allows. This patch addresses <https://bugs.openfabrics.org/show_bug.cgi?id=728> Signed-off-by: Pradeep Satyanarayana <pradeeps@linux.vnet.ibm.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25IPoIB: improve IPv4/IPv6 to IB mcast mapping functionsRolf Manderscheid
An IPoIB subnet on an IB fabric that spans multiple IB subnets can't use link-local scope in multicast GIDs. The existing routines that map IP/IPv6 multicast addresses into IB link-level addresses hard-code the scope to link-local, and they also leave the partition key field uninitialised. This patch adds a parameter (the link-level broadcast address) to the mapping routines, allowing them to initialise both the scope and the P_Key appropriately, and fixes up the call sites. The next step will be to add a way to configure the scope for an IPoIB interface. Signed-off-by: Rolf Manderscheid <rvm@obsidianresearch.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25IPoIB/cm: Add connected mode support for devices without SRQsPradeep Satyanarayana
Some IB adapters (notably IBM's eHCA) do not implement SRQs (shared receive queues). The current IPoIB connected mode support only works on devices that support SRQs. Fix this by adding support for using the receive queue of each connected mode receive QP. The disadvantage of this compared to using an SRQ is that it means a full queue of receives must be posted for each remote connected mode peer, which means that total memory usage is potentially much higher than when using SRQs. To manage this, add a new module parameter "max_nonsrq_conn_qp" that limits the number of connections allowed per interface. The rest of the changes are fairly straightforward: we use a table of struct ipoib_cm_rx to hold all the active connections, and put the table index of the connection in the high bits of receive WR IDs. This is needed because we cannot rely on the struct ib_wc.qp field for non-SRQ receive completions. Most of the rest of the changes just test whether or not an SRQ is available, and post receives or find received packets in the right place depending on the answer. Cleaning up dead connections actually becomes simpler, because we do not have to do the "last WQE reached" dance that is required to destroy QPs attached to an SRQ. We just move the QP to the error state and wait for all pending receives to be flushed. Signed-off-by: Pradeep Satyanarayana <pradeeps@linux.vnet.ibm.com> [ Completely rewritten and split up, based on Pradeep's work. Several bugs fixed and no doubt several bugs introduced. - Roland ] Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25IPoIB/cm: Factor out ipoib_cm_free_rx_reap_list()Roland Dreier
Factor out the code for going through the rx_reap list of struct ipoib_cm_rx and freeing each one. This consolidates the code duplicated between ipoib_cm_dev_stop() and ipoib_cm_rx_reap() and reduces the risk of error when adding additional accounting. Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25IPoIB/cm: Factor out ipoib_cm_create_srq()Roland Dreier
Factor out the code to create an SRQ and allocate the receive ring in ipoib_cm_dev_init() into a new function ipoib_cm_create_srq(). This will make the code neater when support for devices that don't implement SRQs is added. Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25IPoIB/cm: Factor out ipoib_cm_free_rx_ring()Roland Dreier
Factor out the code to unmap/free skbs and free the receive ring in ipoib_cm_dev_cleanup() into a new function ipoib_cm_free_rx_ring(). This function will be called from a couple of other places when support for devices that don't implement SRQs is added. Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25IPoIB: Trivial formatting cleanupsRoland Dreier
Fix whitespace blunders, convert "foo* bar" to "foo *bar", etc. Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-11-27IPoIB: Fix oops if xmit is called when priv->broadcast is NULLJack Morgenstein
If a port goes down, ipoib_ib_dev_down() is invoked -- which flushes the mcasts (clearing priv->broadcast) and clearing the path record cache. If ipoib_start_xmit() is then invoked (before the broadcast group is rejoined), a kernel oops results from attempting to access priv->broadcast, which is still unset. Returning NULL from path_rec_create() if priv->broadcast is NULL is a harmless way of bypassing the problem -- the offending packet is simply discarded "without prejudice." Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-26IPoIB/cm: Fix receive QP cleanupRoland Dreier
Commit 1b524963 ("IPoIB/cm: Use common CQ for CM send completions") changed how the high-order bits of work request IDs were used, which had the effect that IPOIB_CM_RX_DRAIN_WRID was no longer handled as a connected mode receive completion. This leads to the messages ib1: cm send completion event with wrid 1073741823 (> 64) ib1: RX drain timing out when an interface with connected mode QPs is brought down. Fix this by making sure that both IPOIB_OP_CM and IPOIB_OP_RECV are set in IPOIB_CM_RX_DRAIN_WRID. Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-23Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: mlx4_core: Increase command timeout for INIT_HCA to 10 seconds IPoIB/cm: Use common CQ for CM send completions IB/uverbs: Fix checking of userspace object ownership IB/mlx4: Sanity check userspace send queue sizes IPoIB: Rewrite "if (!likely(...))" as "if (unlikely(!(...)))" IB/ehca: Enable large page MRs by default IB/ehca: Change meaning of hca_cap_mr_pgsize IB/ehca: Fix ehca_encode_hwpage_size() and alloc_fmr() IB/ehca: Fix masking error in {,re}reg_phys_mr() IB/ehca: Supply QP token for SRQ base QPs IPoIB: Use round_jiffies() for ah_reap_task RDMA/cma: Fix deadlock destroying listen requests RDMA/cma: Add locking around QP accesses IB/mthca: Avoid alignment traps when writing doorbells mlx4_core: Kill mlx4_write64_raw()
2007-10-19IPoIB/cm: Use common CQ for CM send completionsMichael S. Tsirkin
Use the same CQ for CM send completions as for all other IPoIB completions. This means all completions are processed via the same NAPI polling routine. This should help reduce the number of interrupts for bi-directional traffic (such as TCP) and fixes "driver is hogging interrupts" errors reported for IPoIB send side, e.g. <https://bugs.openfabrics.org/show_bug.cgi?id=508> To do this, keep a per-interface counter of outstanding send WRs, and stop the interface when this counter reaches the send queue size to avoid CQ overruns. Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-17IPoIB: Rewrite "if (!likely(...))" as "if (unlikely(!(...)))"Roland Dreier
It's too hard to figure out what "!likely(...)" really means, and who knows how compilers interpret the hint. Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-16IPoIB: Use round_jiffies() for ah_reap_taskAnton Blanchard
Use round_jiffies() to align the 1 second ah_reap_task with other work and potentially save power by sleeping cores for longer. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-15IB/ipoib: Verify address handle validity on sendMoni Shoua
When the bonding device senses a carrier loss of its active slave it replaces that slave with a new one. In between the times when the carrier of an IPoIB device goes down and ipoib_neigh is destroyed, it is possible that the bonding driver will send a packet on a new slave that uses an old ipoib_neigh. This patch detects and prevents this from happenning. Signed-off-by: Moni Shoua <monis at voltaire.com> Signed-off-by: Or Gerlitz <ogerlitz at voltaire.com> Acked-by: Roland Dreier <rdreier@cisco.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
2007-10-15IB/ipoib: Bound the net device to the ipoib_neigh structueMoni Shoua
IPoIB uses a two layer neighboring scheme, such that for each struct neighbour whose device is an ipoib one, there is a struct ipoib_neigh buddy which is created on demand at the tx flow by an ipoib_neigh_alloc(skb->dst->neighbour) call. When using the bonding driver, neighbours are created by the net stack on behalf of the bonding (master) device. On the tx flow the bonding code gets an skb such that skb->dev points to the master device, it changes this skb to point on the slave device and calls the slave hard_start_xmit function. Under this scheme, ipoib_neigh_destructor assumption that for each struct neighbour it gets, n->dev is an ipoib device and hence netdev_priv(n->dev) can be casted to struct ipoib_dev_priv is buggy. To fix it, this patch adds a dev field to struct ipoib_neigh which is used instead of the struct neighbour dev one, when n->dev->flags has the IFF_MASTER bit set. Signed-off-by: Moni Shoua <monis at voltaire.com> Signed-off-by: Or Gerlitz <ogerlitz at voltaire.com> Acked-by: Roland Dreier <rdreier@cisco.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
2007-10-11Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (87 commits) mlx4_core: Fix section mismatches IPoIB: Allow setting policy to ignore multicast groups IB/mthca: Mark error paths as unlikely() in post_srq_recv functions IB/ipath: Minor fix to ordering of freeing and zeroing of tid pages. IB/ipath: Remove redundant link state checks IB/ipath: Fix IB_EVENT_PORT_ERR event IB/ipath: Better handling of unexpected GPIO interrupts IB/ipath: Maintain active time on all chips IB/ipath: Fix QHT7040 serial number check IB/ipath: Indicate a couple of chip bugs to userspace IB/ipath: iba6110 rev4 no longer needs recv header overrun workaround IB/ipath: Use counters in ipath_poll and cleanup interrupts in ipath_close IB/ipath: Remove duplicate copy of LMC IB/ipath: Add ability to set the LMC via the sysfs debugging interface IB/ipath: Optimize completion queue entry insertion and polling IB/ipath: Implement IB_EVENT_QP_LAST_WQE_REACHED IB/ipath: Generate flush CQE when QP is in error state IB/ipath: Remove redundant code IB/ipath: Future proof eeprom checksum code (contents reading) IB/ipath: UC RDMA WRITE with IMMEDIATE doesn't send the immediate ...
2007-10-10IPoIB: Fix unused variable warningRoland Dreier
The conversion to use netdevice internal stats left an unused variable in ipoib_neigh_free(), since there's no longer any reason to get netdev_priv() in order to increment dropped packets. Delete the unused priv variable. Signed-off-by: Roland Dreier <rolandd@cisco.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
2007-10-10[IPoIB]: Convert to netdevice internal statsRoland Dreier
Use the stats member of struct netdevice in IPoIB, so we can save memory by deleting the stats member of struct ipoib_dev_priv, and save code by deleting ipoib_get_stats(). Signed-off-by: Roland Dreier <rolandd@cisco.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10[NET]: Move hardware header operations out of netdevice.Stephen Hemminger
Since hardware header operations are part of the protocol class not the device instance, make them into a separate object and save memory. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10[NET]: Nuke SET_MODULE_OWNER macro.Ralf Baechle
It's been a useless no-op for long enough in 2.6 so I figured it's time to remove it. The number of people that could object because they're maintaining unified 2.4 and 2.6 drivers is probably rather small. [ Handled drivers added by netdev tree and some missed IRDA cases... -DaveM ] Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Jeff Garzik <jeff@garzik.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10[NET]: Make NAPI polling independent of struct net_device objects.Stephen Hemminger
Several devices have multiple independant RX queues per net device, and some have a single interrupt doorbell for several queues. In either case, it's easier to support layouts like that if the structure representing the poll is independant from the net device itself. The signature of the ->poll() call back goes from: int foo_poll(struct net_device *dev, int *budget) to int foo_poll(struct napi_struct *napi, int budget) The caller is returned the number of RX packets processed (or the number of "NAPI credits" consumed if you want to get abstract). The callee no longer messes around bumping dev->quota, *budget, etc. because that is all handled in the caller upon return. The napi_struct is to be embedded in the device driver private data structures. Furthermore, it is the driver's responsibility to disable all NAPI instances in it's ->stop() device close handler. Since the napi_struct is privatized into the driver's private data structures, only the driver knows how to get at all of the napi_struct instances it may have per-device. With lots of help and suggestions from Rusty Russell, Roland Dreier, Michael Chan, Jeff Garzik, and Jamal Hadi Salim. Bug fixes from Thomas Graf, Roland Dreier, Peter Zijlstra, Joseph Fannin, Scott Wood, Hans J. Koch, and Michael Chan. [ Ported to current tree and all drivers converted. Integrated Stephen's follow-on kerneldoc additions, and restored poll_list handling to the old style to fix mutual exclusion issues. -DaveM ] Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10IPoIB: Allow setting policy to ignore multicast groupsOr Gerlitz
The kernel IB stack allows (through the RDMA CM) userspace applications to join and use multicast groups from the IPoIB MGID range. This allows multicast traffic to be handled directly from userspace QPs, without going through the kernel stack, which gives better performance for some applications. However, to fully interoperate with IP multicast, such userspace applications need to participate in IGMP reports and queries, or else routers may not forward the multicast traffic to the system where the application is running. The simplest way to do this is to share the kernel IGMP implementation by using the IP_ADD_MEMBERSHIP option to join multicast groups that are being handled directly in userspace. However, in such cases, the actual multicast traffic should not also be handled by the IPoIB interface, because that would burn resources handling multicast packets that will just be discarded in the kernel. To handle this, this patch adds lookup on the database used for IB multicast group reference counting when IPoIB is joining multicast groups, and if a multicast group is already handled by user space, then the IPoIB kernel driver ignores the group. This is controlled by a per-interface policy flag. When the flag is set, IPoIB will not join and attach its QP to a multicast group which already has an entry in the database; when the flag is cleared, IPoIB will behave as before this change. For each IPoIB interface, the /sys/class/net/$intf/umcast attribute controls the policy flag. The default value is off/0. Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-09IPoIB/cm: Clean up initialization of QP attr in ipoib_cm_create_tx_qp()Dotan Barak
Make the way QP is being created in ipoib_cm_create_tx_qp() consistent with ipoib_cm_create_rx_qp(). Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-09IPoIB: Specify Traffic Class with path record queries for QoS supportSean Hefty
To support QoS within and between subnets, modify IPoIB to request specific Traffic Class values with path record queries, using the value associated with the IPoIB broadcast group. Signed-off-by: Sean Hefty <sean.hefty@intel.com> [ See some comments I made on this at v1 and v2 of the posts <http://lists.openfabrics.org/pipermail/general/2007-August/039275.html> <http://lists.openfabrics.org/pipermail/general/2007-September/040312.html> ] Reviewed-by: Or Gerlitz <ogerlitz@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-09IPoIB: Fix error path memory leakEli Cohen
Clean up properly if ib_query_pkey() or ib_query_gid() fail. Signed-off-by: Eli Cohen <eli@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-09IPoIB: Fix typo to end statement with ';' instead of ','Eli Cohen
Signed-off-by: Eli Cohen <eli@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-09IPoIB: Make sure no receives are handled when stopping deviceRoland Dreier
The current IPoIB code might process receive completions from ipoib_drain_cq() when bringing down the interface. This could cause packets to be passed up the stack without the device's poll method being called. Avoid this by setting the status of any successful completions to IB_WC_WR_FLUSH_ERR. Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-08-07IPoIB: Fix leak in ipoib_transport_dev_init() error pathJack Morgenstein
ipoib_transport_dev_init() calls ipoib_cm_dev_init(), so it needs to call ipoib_cm_dev_cleanup() to unwind that on the error path. Found by Dotan Barak of Mellanox. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-10IB/cm: Include HCA ACK delay in local ACK timeoutSean Hefty
The IB CM should include the HCA ACK delay when calculating the local ACK timeout value to use for RC QPs. If the HCA ACK delay is large enough relative to the packet life time, then if it is not taken into account, the calculated timeout value ends up being too small, which can result in "retry exceeded" errors. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-10IPoIB: Recycle loopback skbs instead of freeing and reallocatingRoland Dreier
InfiniBand HCAs replicate multicast packets back to the QP that sent them if that QP is attached to the destination multicast group. This means that IPoIB multicasts are often replicated back to the receive queue of the interface that generated them. To avoid confusing the network stack, we drop these duplicates within the IPoIB driver. However, there's no reason to free the skb that received the duplicate and then immediately allocate a new skb to post to the receive queue. We can be more efficient and just repost the same skb. Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-10IPoIB/cm: Fix warning if IPV6 is not enabledRoland Dreier
Fix drivers/infiniband/ulp/ipoib/ipoib_cm.c:1151: warning: unused variable 'dev' by getting rid of the variable dev, which is only used if CONFIG_IPV6 is enabled, and replacing the one use of it with the value it is assigned, namely priv->dev. Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-09IB: Use menuconfig for InfiniBand menuJan Engelhardt
Change Kconfig objects from "menu, config" into "menuconfig" so that the user can disable the whole feature without having to enter the menu first. Signed-off-by: Jan Engelhardt <jengelh@gmx.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-02IPoIB/cm: Partial error clean up unmaps wrong addressRalph Campbell
If a page can't be allocated for the frag list of a skb, the code to unmap the partially allocated list is off by one. For exaple, if 'frags' equals one, i == 0, and the alloc_page() fails, then the old loop would have unmapped mapping[1] which is uninitialized. The same would happen if the call to ib_dma_map_page() failed. Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com> Acked-by: Michael S. Tsirkin <mst@dev.mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-06-21IPoIB/cm: Remove dead definition of struct ipoib_cm_idRoland Dreier
It's completely unused. Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-06-21IPoIB/cm: Fix interoperability when MTU doesn't matchMichael S. Tsirkin
IPoIB connected mode currently rejects a connection request unless the supported MTU is >= the local netdevice MTU. This breaks interoperability with implementations that might have tweaked IPOIB_CM_MTU, and there's real no longer a reason to do so: this test is just a leftover from when we did not tweak MTU per-connection. Fix this by making the test as permissive as possible. Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-06-21IPoIB/cm: Initialize RX before moving QP to RTRMichael S. Tsirkin
Fix a crasher bug in IPoIB CM: once a QP is in the RTR state, a receive completion (or even an asynchronous error) might be observed on this QP, so we have to initialize all of our receive data structures before moving to the RTR state. As an optimization (since modify_qp might take a long time), the jiffies update done when moving RX to the passive_ids list is also left in place to reduce the chance of the RX being misdetected as stale. This fixes bug <https://bugs.openfabrics.org/show_bug.cgi?id=662>. Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-29IPoIB/cm: Fix performance regression on MellanoxMichael S. Tsirkin
commit 518b1646 ("IPoIB/cm: Fix SRQ WR leak") introduced a severe performance regression on Mellanox cards, because keeping a QP in the error state for extended periods of time moves hardware to the slow path (until the QP is destroyed). For example, MPI latency goes from ~3 usecs to ~7 usecs. Fix this by posting a send WR on one of the QPs that are being flushed, instead of using a separate drain QP that is kept in the error state. This fixes bug <https://bugs.openfabrics.org/show_bug.cgi?id=636>, reported and bisected by Scott Weitzenkamp at Cisco and debugged by Sasha Mikheev at Voltaire. Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-24IPoIB/cm: Drain cq in ipoib_cm_dev_stop()Michael S. Tsirkin
Since NAPI polling is disabled while ipoib_cm_dev_stop() is running, ipoib_cm_dev_stop() must poll the CQ itself in order to see the packets draining. Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-24IPoIB/cm: Fix timeout check in ipoib_cm_dev_stop()Michael S. Tsirkin
time_after() was used backwards, so the timeout occurred immediately. Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-21IPoIB/cm: Fix SRQ WR leakMichael S. Tsirkin
SRQ WR leakage has been observed with IPoIB/CM: e.g. flipping ports on and off will, with time, leak out all WRs and then all connections will start getting RNR NAKs. Fix this in the way suggested by spec: move the QP being destroyed to the error state, wait for "Last WQE Reached" event and then post WR on a "drain QP" connected to the same CQ. Once we observe a completion on the drain QP, it's safe to call ib_destroy_qp. Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-21IB/ipoib: Fix typos in error messagesMichael S. Tsirkin
Trivial error message fixups. Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-19IPoIB: Handle P_Key table reorderingYosef Etigin
SM reconfiguration or failover possibly causes a shuffling of the values in the P_Key table. Right now, IPoIB only queries for the P_Key index once when it creates the device QP, and hence there are problems if the index of a P_Key value changes. Fix this by using the PKEY_CHANGE event to trigger a recheck of the P_Key index. Signed-off-by: Yosef Etigin <yosefe@voltaire.com> Acked-by: Michael S. Tsirkin <mst@dev.mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-14IPoIB/cm: Optimize stale connection detectionMichael S. Tsirkin
In the presence of some running RX connections, we repeat queue_delayed_work calls each 4 RX WRs, which is a waste. It's enough to start stale task when a first passive connection is added, and rerun it every IPOIB_CM_RX_DELAY as long as there are outstanding passive connections. This removes some code from RX data path. Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-07Merge branch 'for-linus' of ↵Linus Torvalds
master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband * 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband: IPoIB: Convert to NAPI IB: Return "maybe missed event" hint from ib_req_notify_cq() IB: Add CQ comp_vector support IB/ipath: Fix a race condition when generating ACKs IB/ipath: Fix two more spin lock problems IB/fmr_pool: Add prefix to all printks IB/srp: Set proc_name IB/srp: Add orig_dgid sysfs attribute to scsi_host IPoIB/cm: Don't crash if remote side uses one QP for both directions RDMA/cxgb3: Support for new abort logic RDMA/cxgb3: Initialize cpu_idx field in cpl_close_listserv_req message RDMA/cxgb3: Fail qp creation if the requested max_inline is too large RDMA/cxgb3: Fix TERM codes IPoIB/cm: Fix error handling in ipoib_cm_dev_open() IB/ipath: Don't corrupt pending mmap list when unmapped objects are freed IB/mthca: Work around kernel QP starvation IB/ipath: Don't put QP in timeout queue if waiting to send IB/ipath: Don't call spin_lock_irq() from interrupt context
2007-05-06IPoIB: Convert to NAPIRoland Dreier
Convert the IP-over-InfiniBand network device driver over to using NAPI to handle completions for the main CQ. This covers all receives as well as datagram mode sends; send completions for connected mode connections are still handled from interrupt context. Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-06IB: Add CQ comp_vector supportMichael S. Tsirkin
Add a num_comp_vectors member to struct ib_device and extend ib_create_cq() to pass in a comp_vector parameter -- this parallels the userspace libibverbs API. Update all hardware drivers to set num_comp_vectors to 1 and have all ULPs pass 0 for the comp_vector value. Pass the value of num_comp_vectors to userspace rather than hard-coding a value of 1. We want multiple CQ event vector support (via MSI-X or similar for adapters that can generate multiple interrupts), but it's not clear how many vectors we want, or how we want to deal with policy issues such as how to decide which vector to use or how to set up interrupt affinity. This patch is useful for experimenting, since no core changes will be necessary when updating a driver to support multiple vectors, and we know that we want to make at least these changes anyway. Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-06IPoIB/cm: Don't crash if remote side uses one QP for both directionsMichael S. Tsirkin
The IPoIB CM spec allows the use of a single connection in both active->passive and passive->active directions. The current Linux code uses one connection for both directions, but if another node only uses one connection for both directions, we oops when we try to look up the passive connection. Fix by checking that qp_context is non-NULL before dereferencing it. Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
2007-05-02PCI: Cleanup the includes of <linux/pci.h>Jean Delvare
I noticed that many source files include <linux/pci.h> while they do not appear to need it. Here is an attempt to clean it all up. In order to find all possibly affected files, I searched for all files including <linux/pci.h> but without any other occurence of "pci" or "PCI". I removed the include statement from all of these, then I compiled an allmodconfig kernel on both i386 and x86_64 and fixed the false positives manually. My tests covered 66% of the affected files, so there could be false positives remaining. Untested files are: arch/alpha/kernel/err_common.c arch/alpha/kernel/err_ev6.c arch/alpha/kernel/err_ev7.c arch/ia64/sn/kernel/huberror.c arch/ia64/sn/kernel/xpnet.c arch/m68knommu/kernel/dma.c arch/mips/lib/iomap.c arch/powerpc/platforms/pseries/ras.c arch/ppc/8260_io/enet.c arch/ppc/8260_io/fcc_enet.c arch/ppc/8xx_io/enet.c arch/ppc/syslib/ppc4xx_sgdma.c arch/sh64/mach-cayman/iomap.c arch/xtensa/kernel/xtensa_ksyms.c arch/xtensa/platform-iss/setup.c drivers/i2c/busses/i2c-at91.c drivers/i2c/busses/i2c-mpc.c drivers/media/video/saa711x.c drivers/misc/hdpuftrs/hdpu_cpustate.c drivers/misc/hdpuftrs/hdpu_nexus.c drivers/net/au1000_eth.c drivers/net/fec_8xx/fec_main.c drivers/net/fec_8xx/fec_mii.c drivers/net/fs_enet/fs_enet-main.c drivers/net/fs_enet/mac-fcc.c drivers/net/fs_enet/mac-fec.c drivers/net/fs_enet/mac-scc.c drivers/net/fs_enet/mii-bitbang.c drivers/net/fs_enet/mii-fec.c drivers/net/ibm_emac/ibm_emac_core.c drivers/net/lasi_82596.c drivers/parisc/hppb.c drivers/sbus/sbus.c drivers/video/g364fb.c drivers/video/platinumfb.c drivers/video/stifb.c drivers/video/valkyriefb.c include/asm-arm/arch-ixp4xx/dma.h sound/oss/au1550_ac97.c I would welcome test reports for these files. I am fine with removing the untested files from the patch if the general opinion is that these changes aren't safe. The tested part would still be nice to have. Note that this patch depends on another header fixup patch I submitted to LKML yesterday: [PATCH] scatterlist.h needs types.h http://lkml.org/lkml/2007/3/01/141 Signed-off-by: Jean Delvare <khali@linux-fr.org> Cc: Badari Pulavarty <pbadari@us.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>