aboutsummaryrefslogtreecommitdiff
path: root/drivers/infiniband/hw/mthca
AgeCommit message (Collapse)Author
2007-07-18IB/mthca: Simplify use of size0 in work request postingRoland Dreier
Current code sets size0 to 0 at the start of work request posting functions and then handles size0 == 0 specially within the loop over work requests. Change this so size0 is set along with f0 the first time through the loop (when nreq == 0). This makes the code easier to understand by making it clearer that f0 and size0 are always initialized if nreq != 0 without having to know that size0 == 0 implies nreq == 0. Also annotate size0 with uninitialized_var() so that this doesn't introduce a new compiler warning. Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-18IB/mthca: Factor out setting WQE UD segment entriesRoland Dreier
Factor code to set UD entries out of the work request posting functions into inline functions set_tavor_ud_seg() and set_arbel_ud_seg(). This doesn't change the generated code in any significant way, and makes the source easier on the eyes. Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-18IB/mthca: Factor out setting WQE remote address and atomic segment entriesRoland Dreier
Factor code to set remote address and atomic segment entries out of the work request posting functions into inline functions set_raddr_seg() and set_atomic_seg(). This doesn't change the generated code in any significant way, and makes the source easier on the eyes. Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-18IB/mthca: Factor out setting WQE data segment entriesRoland Dreier
Factor code to set data segment entries out of the work request posting functions into inline functions mthca_set_data_seg() and mthca_set_data_seg_inval(). This makes the code more readable and also allows the compiler to do a better job -- on x86_64: add/remove: 0/0 grow/shrink: 0/6 up/down: 0/-69 (-69) function old new delta mthca_arbel_post_srq_recv 373 369 -4 mthca_arbel_post_receive 570 562 -8 mthca_tavor_post_srq_recv 520 508 -12 mthca_tavor_post_send 1344 1330 -14 mthca_arbel_post_send 1481 1467 -14 mthca_tavor_post_receive 792 775 -17 Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-17IB/mthca: Use uninitialized_var() for f0Roland Dreier
Commit 9db48926 ("drivers/infiniband/hw/mthca/mthca_qp: kill uninit'd var warning") added "= 0" to the declarations of f0 to shut up gcc warnings. However, there's no point in making the code bigger by initializing f0 to a random value just to get rid of a warning; setting f0 to 0 is no safer than just using uninitialized_var(), which documents the situation better and gives smaller code too. For example, on x86_64: add/remove: 0/0 grow/shrink: 0/2 up/down: 0/-16 (-16) function old new delta mthca_tavor_post_send 1352 1344 -8 mthca_arbel_post_send 1489 1481 -8 Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-17IB/mthca: Fix printk format used for firmware version in warningRoland Dreier
When warning about out-of-date firmware, current mthca code messes up the formatting of the version if the subminor doesn't have three digits. It doesn't fill the field with 0s so we end up with: ib_mthca 0000:0b:00.0: HCA FW version 1.1. 0 is old (1.2. 0 is current). Change the format from "%3d" to "%03d" to get the right thing printed. Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-17IB/mthca: Schedule MSI support for removalRoland Dreier
The mthca driver supports both MSI and MSI-X. However, MSI-X works with all hardware that the driver handles, and provides a superset of what MSI does, so there's no point in having code for both. Schedule MSI support for removal in 2008 to give anyone who actually needs MSI and who can't use MSI time to speak up. Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-17drivers/infiniband/hw/mthca/mthca_qp: kill uninit'd var warningJeff Garzik
drivers/infiniband/hw/mthca/mthca_qp.c: In function ‘mthca_tavor_post_send’: drivers/infiniband/hw/mthca/mthca_qp.c:1594: warning: ‘f0’ may be used uninitialized in this function drivers/infiniband/hw/mthca/mthca_qp.c: In function ‘mthca_arbel_post_send’: drivers/infiniband/hw/mthca/mthca_qp.c:1949: warning: ‘f0’ may be used uninitialized in this function Initializing 'f0' is not strictly necessary in either case, AFAICS. I was considering use of uninitialized_var(), but looking at the complex flow of control in each function, I feel it is wiser and safer to simply zero the var and be certain of ourselves. Signed-off-by: Jeff Garzik <jeff@garzik.org>
2007-07-10IB/mthca: Replace memset(<addr>, 0, PAGE_SIZE) with clear_page(<addr>)Shani Moideen
Signed-off-by: Shani Moideen <shani.moideen@wipro.com> Signed-off-by: Roland Dreier <rolandd@cisco.com> ----
2007-07-09IB: Use menuconfig for InfiniBand menuJan Engelhardt
Change Kconfig objects from "menu, config" into "menuconfig" so that the user can disable the whole feature without having to enter the menu first. Signed-off-by: Jan Engelhardt <jengelh@gmx.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-06-07IB/mthca, mlx4_core: Fix typo in commentRoland Dreier
s/signifant/significant/ Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-29IB/mthca: Fix handling of send CQE with error for QPs connected to SRQMichael S. Tsirkin
mthca_free_err_wqe() currently treats both send and receive CQEs identically if a QP is using an SRQ. But for Tavor hardware, send CQEs with error can be chained together even if the RQ is part of SRQ, so we may miss some CQEs. Fix by following the WQE chain for all send CQEs even for non-SRQ QPs. This fixes crashes in IPoIB CM: <https://bugs.openfabrics.org//show_bug.cgi?id=604> Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-21Merge branch 'for-linus' of ↵Linus Torvalds
master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband * 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband: IB/cm: Improve local id allocation IPoIB/cm: Fix SRQ WR leak IB/ipoib: Fix typos in error messages IB/mlx4: Check if SRQ is full when posting receive IB/mlx4: Pass send queue sizes from userspace to kernel IB/mlx4: Fix check of opcode in mlx4_ib_post_send() mlx4_core: Fix array overrun in dump_dev_cap_flags() IB/mlx4: Fix RESET to RESET and RESET to ERROR transitions IB/mthca: Fix RESET to ERROR transition IB/mlx4: Set GRH:HopLimit when sending globally routed MADs IB/mthca: Set GRH:HopLimit when building MLX headers IB/mlx4: Fix check of max_qp_dest_rdma in modify QP IB/mthca: Fix use-after-free on device restart IB/ehca: Return proper error code if register_mr fails IPoIB: Handle P_Key table reordering IB/core: Use start_port() and end_port() IB/core: Add helpers for uncached GID and P_Key searches IB/ipath: Fix potential deadlock with multicast spinlocks IB/core: Free umem when mm is already gone
2007-05-21Detach sched.h from mm.hAlexey Dobriyan
First thing mm.h does is including sched.h solely for can_do_mlock() inline function which has "current" dereference inside. By dealing with can_do_mlock() mm.h can be detached from sched.h which is good. See below, why. This patch a) removes unconditional inclusion of sched.h from mm.h b) makes can_do_mlock() normal function in mm/mlock.c c) exports can_do_mlock() to not break compilation d) adds sched.h inclusions back to files that were getting it indirectly. e) adds less bloated headers to some files (asm/signal.h, jiffies.h) that were getting them indirectly Net result is: a) mm.h users would get less code to open, read, preprocess, parse, ... if they don't need sched.h b) sched.h stops being dependency for significant number of files: on x86_64 allmodconfig touching sched.h results in recompile of 4083 files, after patch it's only 3744 (-8.3%). Cross-compile tested on all arm defconfigs, all mips defconfigs, all powerpc defconfigs, alpha alpha-up arm i386 i386-up i386-defconfig i386-allnoconfig ia64 ia64-up m68k mips parisc parisc-up powerpc powerpc-up s390 s390-up sparc sparc-up sparc64 sparc64-up um-x86_64 x86_64 x86_64-up x86_64-defconfig x86_64-allnoconfig as well as my two usual configs. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-19IB/mthca: Fix RESET to ERROR transitionMichael S. Tsirkin
According to the IB spec, a QP can be moved from RESET to the ERROR state, but mthca firmware does not support this and returns an error if we try. Work around this FW limitation by moving the QP from RESET to INIT with dummy parameters and then transitioning from INIT to ERROR. Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-19IB/mthca: Set GRH:HopLimit when building MLX headersRolf Manderscheid
Global CM packets used by rmda_cm were being sent with a GRH:hopLimit of zero, causing them to be dropped by the router. The problem is a missing initialization of the hop_limit field in mthca_read_ah(), which was called by build_mlx_header() when sending a MAD on QP1. Signed-off-by: Rolf Manderscheid <rvm@obsidianresearch.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-19IB/mthca: Fix use-after-free on device restartAli Ayoub
Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-14IB/mthca: Set cleaned CQEs back to HW ownership when cleaning CQMichael S. Tsirkin
mthca_cq_clean() updates the CQ consumer index without moving CQEs back to HW ownership. As a result, the same WRID might get reported twice, resulting in a use-after-free. This was observed in IPoIB CM. Fix by moving all freed CQEs to HW ownership. This fixes <https://bugs.openfabrics.org/show_bug.cgi?id=617> Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-14IB/mthca: Fix posting >255 recv WRs for TavorMichael S. Tsirkin
Fix posting lists of > 255 receive WRs for Tavor: rq.next_ind must be updated each doorbell, otherwise the next doorbell will use an incorrect index. Found by Ronni Zimmermann at Mellanox. Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-08IB/uverbs: Export ib_umem_get()/ib_umem_release() to modulesRoland Dreier
Export ib_umem_get()/ib_umem_release() and put low-level drivers in control of when to call ib_umem_get() to pin and DMA map userspace, rather than always calling it in ib_uverbs_reg_mr() before calling the low-level driver's reg_user_mr method. Also move these functions to be in the ib_core module instead of ib_uverbs, so that driver modules using them do not depend on ib_uverbs. This has a number of advantages: - It is better design from the standpoint of making generic code a library that can be used or overridden by device-specific code as the details of specific devices dictate. - Drivers that do not need to pin userspace memory regions do not need to take the performance hit of calling ib_mem_get(). For example, although I have not tried to implement it in this patch, the ipath driver should be able to avoid pinning memory and just use copy_{to,from}_user() to access userspace memory regions. - Buffers that need special mapping treatment can be identified by the low-level driver. For example, it may be possible to solve some Altix-specific memory ordering issues with mthca CQs in userspace by mapping CQ buffers with extra flags. - Drivers that need to pin and DMA map userspace memory for things other than memory regions can use ib_umem_get() directly, instead of hacks using extra parameters to their reg_phys_mr method. For example, the mlx4 driver that is pending being merged needs to pin and DMA map QP and CQ buffers, but it does not need to create a memory key for these buffers. So the cleanest solution is for mlx4 to call ib_umem_get() in the create_qp and create_cq methods. Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-07Merge branch 'for-linus' of ↵Linus Torvalds
master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband * 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband: IPoIB: Convert to NAPI IB: Return "maybe missed event" hint from ib_req_notify_cq() IB: Add CQ comp_vector support IB/ipath: Fix a race condition when generating ACKs IB/ipath: Fix two more spin lock problems IB/fmr_pool: Add prefix to all printks IB/srp: Set proc_name IB/srp: Add orig_dgid sysfs attribute to scsi_host IPoIB/cm: Don't crash if remote side uses one QP for both directions RDMA/cxgb3: Support for new abort logic RDMA/cxgb3: Initialize cpu_idx field in cpl_close_listserv_req message RDMA/cxgb3: Fail qp creation if the requested max_inline is too large RDMA/cxgb3: Fix TERM codes IPoIB/cm: Fix error handling in ipoib_cm_dev_open() IB/ipath: Don't corrupt pending mmap list when unmapped objects are freed IB/mthca: Work around kernel QP starvation IB/ipath: Don't put QP in timeout queue if waiting to send IB/ipath: Don't call spin_lock_irq() from interrupt context
2007-05-06IB: Return "maybe missed event" hint from ib_req_notify_cq()Roland Dreier
The semantics defined by the InfiniBand specification say that completion events are only generated when a completions is added to a completion queue (CQ) after completion notification is requested. In other words, this means that the following race is possible: while (CQ is not empty) ib_poll_cq(CQ); // new completion is added after while loop is exited ib_req_notify_cq(CQ); // no event is generated for the existing completion To close this race, the IB spec recommends doing another poll of the CQ after requesting notification. However, it is not always possible to arrange code this way (for example, we have found that NAPI for IPoIB cannot poll after requesting notification). Also, some hardware (eg Mellanox HCAs) actually will generate an event for completions added before the call to ib_req_notify_cq() -- which is allowed by the spec, since there's no way for any upper-layer consumer to know exactly when a completion was really added -- so the extra poll of the CQ is just a waste. Motivated by this, we add a new flag "IB_CQ_REPORT_MISSED_EVENTS" for ib_req_notify_cq() so that it can return a hint about whether the a completion may have been added before the request for notification. The return value of ib_req_notify_cq() is extended so: < 0 means an error occurred while requesting notification == 0 means notification was requested successfully, and if IB_CQ_REPORT_MISSED_EVENTS was passed in, then no events were missed and it is safe to wait for another event. > 0 is only returned if IB_CQ_REPORT_MISSED_EVENTS was passed in. It means that the consumer must poll the CQ again to make sure it is empty to avoid the race described above. We add a flag to enable this behavior rather than turning it on unconditionally, because checking for missed events may incur significant overhead for some low-level drivers, and consumers that don't care about the results of this test shouldn't be forced to pay for the test. Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-06IB: Add CQ comp_vector supportMichael S. Tsirkin
Add a num_comp_vectors member to struct ib_device and extend ib_create_cq() to pass in a comp_vector parameter -- this parallels the userspace libibverbs API. Update all hardware drivers to set num_comp_vectors to 1 and have all ULPs pass 0 for the comp_vector value. Pass the value of num_comp_vectors to userspace rather than hard-coding a value of 1. We want multiple CQ event vector support (via MSI-X or similar for adapters that can generate multiple interrupts), but it's not clear how many vectors we want, or how we want to deal with policy issues such as how to decide which vector to use or how to set up interrupt affinity. This patch is useful for experimenting, since no core changes will be necessary when updating a driver to support multiple vectors, and we know that we want to make at least these changes anyway. Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-02PCI: Cleanup the includes of <linux/pci.h>Jean Delvare
I noticed that many source files include <linux/pci.h> while they do not appear to need it. Here is an attempt to clean it all up. In order to find all possibly affected files, I searched for all files including <linux/pci.h> but without any other occurence of "pci" or "PCI". I removed the include statement from all of these, then I compiled an allmodconfig kernel on both i386 and x86_64 and fixed the false positives manually. My tests covered 66% of the affected files, so there could be false positives remaining. Untested files are: arch/alpha/kernel/err_common.c arch/alpha/kernel/err_ev6.c arch/alpha/kernel/err_ev7.c arch/ia64/sn/kernel/huberror.c arch/ia64/sn/kernel/xpnet.c arch/m68knommu/kernel/dma.c arch/mips/lib/iomap.c arch/powerpc/platforms/pseries/ras.c arch/ppc/8260_io/enet.c arch/ppc/8260_io/fcc_enet.c arch/ppc/8xx_io/enet.c arch/ppc/syslib/ppc4xx_sgdma.c arch/sh64/mach-cayman/iomap.c arch/xtensa/kernel/xtensa_ksyms.c arch/xtensa/platform-iss/setup.c drivers/i2c/busses/i2c-at91.c drivers/i2c/busses/i2c-mpc.c drivers/media/video/saa711x.c drivers/misc/hdpuftrs/hdpu_cpustate.c drivers/misc/hdpuftrs/hdpu_nexus.c drivers/net/au1000_eth.c drivers/net/fec_8xx/fec_main.c drivers/net/fec_8xx/fec_mii.c drivers/net/fs_enet/fs_enet-main.c drivers/net/fs_enet/mac-fcc.c drivers/net/fs_enet/mac-fec.c drivers/net/fs_enet/mac-scc.c drivers/net/fs_enet/mii-bitbang.c drivers/net/fs_enet/mii-fec.c drivers/net/ibm_emac/ibm_emac_core.c drivers/net/lasi_82596.c drivers/parisc/hppb.c drivers/sbus/sbus.c drivers/video/g364fb.c drivers/video/platinumfb.c drivers/video/stifb.c drivers/video/valkyriefb.c include/asm-arm/arch-ixp4xx/dma.h sound/oss/au1550_ac97.c I would welcome test reports for these files. I am fine with removing the untested files from the patch if the general opinion is that these changes aren't safe. The tested part would still be nice to have. Note that this patch depends on another header fixup patch I submitted to LKML yesterday: [PATCH] scatterlist.h needs types.h http://lkml.org/lkml/2007/3/01/141 Signed-off-by: Jean Delvare <khali@linux-fr.org> Cc: Badari Pulavarty <pbadari@us.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-04-30IB/mthca: Work around kernel QP starvationMichael S. Tsirkin
With mthca, RC QPs can starve each other and even UD QPs on the same hardware schedule queue. As a result, userspace MPI can starve e.g. IPoIB traffic, with netdev watchdog warnings getting printed out, and TCP connections getting stuck or failing. Reduce the chance of this happening by using three separate hardware schedule queues: one for userspace RC QPs, one for kernel RC QPs, and one for all other QPs. Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-04-24IB: Set class_dev->dev in core for nice device symlinkJoachim Fenkes
All RDMA drivers except ehca set class_dev->dev to their dma_device value (ehca leaves this unset). dma_device is the only value that makes any sense, so move this assignment to core/sysfs.c. This reduce the duplicated code in the rest of the drivers and gives ehca a nice /sys/class/infiniband/ehcaX/device symlink. Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-04-24IB/mthca: Simplify CQ cleaning in mthca_free_qp()Roland Dreier
mthca_free_qp() already has local variables to hold the QP's send_cq and recv_cq, so we can slightly clean up the calls to mthca_cq_clean() by using those local variables instead of expressions like to_mcq(qp->ibqp.send_cq). Also, by cleaning the recv_cq first, we can avoid worrying about whether the QP is attached to an SRQ for the second call, because we would only clean send_cq if send_cq is not equal to recv_cq, and that means send_cq cannot have any receive completions from the QP being destroyed. All this work even improves the generated code a bit: add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-5 (-5) function old new delta mthca_free_qp 510 505 -5 Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-04-24IB/mthca: Fix mthca_write_mtt() on HCAs with hidden memoryRoland Dreier
Commit b2875d4c ("IB/mthca: Always fill MTTs from CPU") causes a crash in mthca_write_mtt() with non-memfree HCAs that have their memory hidden (that is, have only two PCI BARs instead of having a third BAR that allows access to the RAM attached to the HCA) on 64-bit architectures. This is because the commit just before, c20e20ab ("IB/mthca: Merge MR and FMR space on 64-bit systems") makes dev->mr_table.fmr_mtt_buddy equal to &dev->mr_table.mtt_buddy and hence mthca_write_mtt() tries to write directly into the HCA's MTT table. However, since that table is in the HCA's memory, this is impossible without the PCI BAR that gives access to that memory. This causes a crash because mthca_tavor_write_mtt_seg() basically tries to dereference some offset of a NULL pointer. Fix this by adding a test of MTHCA_FLAG_FMR in mthca_write_mtt() so that we always use the WRITE_MTT firmware command rather than writing directly if FMRs are not enabled. Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-04-18IB/mthca: Update HCA firmware revisionsRoland Dreier
Update the driver's list of current firmware versions with Mellanox's latest releases. Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-04-16IB/mthca: Fix data corruption after FMR unmap on SinaiMichael S. Tsirkin
In mthca_arbel_fmr_unmap(), the high bits of the key are masked off. This gets rid of the effect of adjust_key(), which makes sure that bits 3 and 23 of the key are equal when the Sinai throughput optimization is enabled, and so it may happen that an FMR will end up with bits 3 and 23 in the key being different. This causes data corruption, because when enabling the throughput optimization, the driver promises the HCA firmware that bits 3 and 23 of all memory keys will always be equal. Fix by re-applying adjust_key() after masking the key. Thanks to Or Gerlitz for reproducing the problem, and Ariel Shahar for help in debug. Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-03-26IB/mthca: Fix thinko in init_mr_table()Michael S. Tsirkin
Commit c20e20ab ("IB/mthca: Merge MR and FMR space on 64-bit systems") swapped the number of MTTs and MPTs when initializing the MR table. As a result, we get a kernel oops when the number of MTT segments allocated exceeds 0x20000. Noted by Troy Benjegerdes <troy@scl.ameslab.gov>, and reproduced by Dotan Barak <dotanb@mellanox.co.il>. This fixes https://bugs.openfabrics.org/show_bug.cgi?id=490 Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-03-01IB/mthca: Fix error path in mthca_alloc_memfree()Roland Dreier
The garbled logic in mthca_alloc_memfree() causes it to return 0, even if it fails to allocate all doorbell records. Fix it to return -ENOMEM when it fails. Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-20IB/mthca: Make 2 functions staticAdrian Bunk
This patch makes the needlessly global functions mthca_tavor_write_mtt_seg() and mthca_arbel_write_mtt_seg() static. Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-16IB/mthca: Fix allocation of ICM chunks in coherent memoryRoland Dreier
The change to allow allocating ICM chunks from coherent memory did not increment the count of sg entries properly, so a chunk that required more than allocation would not be mapped properly by the HCA. Fix this by adding the missing increment of chunk->nsg. Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-16IB/mthca: Allow the QP state transition RESET->RESETDotan Barak
RESET->RESET is an allowed QP state transition, so mthca should handle it correctly, by just returning success without involving the firmware. Signed-off-by: Dotan Barak <dotanb@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-12IB/mthca: Always fill MTTs from CPUMichael S. Tsirkin
Speed up memory registration by filling in MTTs directly when the CPU can write directly to the whole table (all mem-free cards, and to Tavor mode on 64-bit systems with the patch I posted earlier). This reduces the number of FW commands needed to register an MR by at least a factor of 2 and speeds up memory registration significantly. Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-12IB/mthca: Merge MR and FMR space on 64-bit systemsMichael S. Tsirkin
For Tavor, we currently reserve separate MPT and MTT space for FMRs to avoid abusing the vmalloc space on 32 bit kernels. No such problem exists on 64 bit kernels so let's not do it there. This way we have a shared pool for MR and FMR resources, used on demand. This will also make it possible to write MTTs for regular regions directly from driver. Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-12IB/mthca: Fix access to MTT and MPT tables on non-cache-coherent CPUsMichael S. Tsirkin
We allocate the MTT table with alloc_pages() and then do pci_map_sg(), so we must call pci_dma_sync_sg() after the CPU writes to the MTT table. This works since the device will never write MTTs on mem-free HCAs, once we get rid of the use of the WRITE_MTT firmware command. This change is needed to make that work, and is an improvement for now, since it gives FMRs a chance at working. For MPTs, both the device and CPU might write there, so we must allocate DMA coherent memory for these. Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-12IB/mthca: Give reserved MTTs a separate cache lineMichael S. Tsirkin
MTTs are allocated in non-cache-coherent memory, so we must give reserved MTTs their own cache line, to prevent both device and CPU from writing into the same cache line at the same time. Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-12IB/mthca: Fix reserved MTTs calculation on mem-free HCAsMichael S. Tsirkin
The reserved_mtts field has different meaning in Tavor and Arbel, so we are wasting mtt entries on memfree. Fix the Arbel case to match Tavor semantics. Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-10IB/mthca: Work around gcc bug on sparc64David Howells
For some reason gcc-3.4.5 on sparc64 does: WARNING: "____ilog2_NaN" [drivers/infiniband/hw/mthca/ib_mthca.ko] undefined! Points to note: (1) The asm volatile flush/flushw are just markers for viewing what comes out in the assembly; removing them has no effect on the result. (2) Changing almost anything else in dwh__mthca_arbel_init_srq_context() or dwh__mthca_alloc_srq() causes the problem to go away. The compiler command line issued by the kernel build is: /opt/crosstool/gcc-3.4.5-glibc-2.3.6/sparc64-unknown-linux-gnu/bin/sparc64-unknown-linux-gnu-gcc -fno-strict-aliasing -fno-common -Os -m64 -mno-fpu -mcpu=ultrasparc -mcmodel=medlow -ffixed-g4 -ffixed-g5 -fcall-used-g7 -Wa,--undeclared-regs -pg -fno-omit-frame-pointer -fno-optimize-sibling-calls -fasynchronous-unwind-tables -g -c -o drivers/infiniband/hw/mthca/.tmp_mthca_srq.o drivers/infiniband/hw/mthca/mthca_srq.c This can be reduced to this whilst still retaining the problem: /opt/crosstool/gcc-3.4.5-glibc-2.3.6/sparc64-unknown-linux-gnu/bin/sparc64-unknown-linux-gnu-gcc -m64 -c -o drivers/infiniband/hw/mthca/mthca_srq.o drivers/infiniband/hw/mthca/mthca_srq.c -Os Removing -Os or changing it to -O or -O0 thru -O6 gets rid of the problem. This patch to the kernel code fixes the problem: Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-10IB/mthca: Use correct structure size in call to memset()Roland Dreier
When clearing the ib_ah_attr parameter in to_ib_ah_attr(), use sizeof *ib_ah_attr instead of sizeof *path. Pointed out by Jack Morgenstein <jackm@mellanox.co.il>. Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-04IB: Return qp pointer as part of ib_wcMichael S. Tsirkin
struct ib_wc currently only includes the local QP number: this matches the IB spec, but seems mostly useless. The following patch replaces this with the pointer to qp itself, and updates all low level drivers and all users. This has the following advantages: - Ability to get a per-qp context through wc->qp->qp_context - Existing drivers already have the qp pointer ready in poll cq, so this change actually saves a tiny bit (extra memory read) on data path (for ehca it would actually be expensive to find the QP pointer when polling a CQ, but ehca does not support SRQ so we can leave wc->qp as NULL for ehca) - Users that need the QP number can still get it through wc->qp->qp_num Use case: In IPoIB connected mode code, I have a common CQ shared by multiple QPs. To track connection usage, I need a way to get at some per-QP context upon the completion, and I would like to avoid allocating context object per work request just to stick a QP pointer into it. With this code, I can just use wc->qp->qp_context. Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-01-09IB/mthca: Don't execute QUERY_QP firmware command for QP in RESET stateDotan Barak
If a QP being queried is in the RESET state, don't execute the QUERY_QP firmware command (because it will fail). Signed-off-by: Dotan Barak <dotanb@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-01-07IB/mthca: Fix PRM compliance problem in atomic-send completionsJack Morgenstein
According to the Tavor and Arbel programmer's reference manuals, the number of bytes transferred is not provided in the byte_cnt field of the CQ entry for atomic operation completions. For atomic operations, the number of bytes transferred is always 8 (when the status is "success"), and this constant value should always be used by the driver in the ib_wc entry returned, rather than using the CQE. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-01-04IB/mthca: Fix off-by-one in FMR handling on memfreeMichael S. Tsirkin
mthca_table_find() will return the wrong address when the table entry being searched for is exactly at the beginning of a sglist entry (other than the first), because it uses >= when it should use >. Example: assume we have 2 entries in scatterlist, 4K each, offset is 4K. The current code will return first entry + 4K when we really want the second entry. In particular this means mapping an FMR on a memfree HCA may end up writing the page table into the wrong place, leading to memory corruption and also causing the HCA to use an incorrect address translation table. Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-12-30[PATCH] IB/mthca: Fix FMR breakage caused by kmemdup() conversionMichael S. Tsirkin
Commit bed8bdfd ("IB: kmemdup() cleanup") introduced one bad conversion to kmemdup() in mthca_alloc_fmr(), where the structure allocated and the structure copied are not the same size. Revert this back to the original kmalloc()/memcpy() code. Reported-by: Dotan Barak <dotanb@mellanox.co.il>. Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il> Signed-off-by: Roland Dreier <roland@digitalvampire.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-15IB/mthca: Use DEFINE_MUTEX() instead of mutex_init()Roland Dreier
mthca_device_mutex() can be initialized automatically with DEFINE_MUTEX() rather than explicitly calling mutex_init(). This saves a bit of text and shrinks the source by a line, so we may as well do it.... Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-12-15IB/mthca: Add HCA profile module parametersLeonid Arsh
Add module parameters that enable settting some of the HCA profile values, such as the number of QPs, CQs, etc. Signed-off-by: Leonid Arsh <leonida@voltaire.com> Signed-off-by: Moni Shoua <monis@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-12-08[PATCH] LOG2: Implement a general integer log2 facility in the kernelDavid Howells
This facility provides three entry points: ilog2() Log base 2 of unsigned long ilog2_u32() Log base 2 of u32 ilog2_u64() Log base 2 of u64 These facilities can either be used inside functions on dynamic data: int do_something(long q) { ...; y = ilog2(x) ...; } Or can be used to statically initialise global variables with constant values: unsigned n = ilog2(27); When performing static initialisation, the compiler will report "error: initializer element is not constant" if asked to take a log of zero or of something not reducible to a constant. They treat negative numbers as unsigned. When not dealing with a constant, they fall back to using fls() which permits them to use arch-specific log calculation instructions - such as BSR on x86/x86_64 or SCAN on FRV - if available. [akpm@osdl.org: MMC fix] Signed-off-by: David Howells <dhowells@redhat.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: David Howells <dhowells@redhat.com> Cc: Wojtek Kaniewski <wojtekka@toxygen.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>