diff options
Diffstat (limited to 'Documentation/infiniband')
-rw-r--r-- | Documentation/infiniband/core_locking.txt | 114 | ||||
-rw-r--r-- | Documentation/infiniband/user_mad.txt | 53 |
2 files changed, 154 insertions, 13 deletions
diff --git a/Documentation/infiniband/core_locking.txt b/Documentation/infiniband/core_locking.txt new file mode 100644 index 00000000000..e1678542279 --- /dev/null +++ b/Documentation/infiniband/core_locking.txt @@ -0,0 +1,114 @@ +INFINIBAND MIDLAYER LOCKING + + This guide is an attempt to make explicit the locking assumptions + made by the InfiniBand midlayer. It describes the requirements on + both low-level drivers that sit below the midlayer and upper level + protocols that use the midlayer. + +Sleeping and interrupt context + + With the following exceptions, a low-level driver implementation of + all of the methods in struct ib_device may sleep. The exceptions + are any methods from the list: + + create_ah + modify_ah + query_ah + destroy_ah + bind_mw + post_send + post_recv + poll_cq + req_notify_cq + map_phys_fmr + + which may not sleep and must be callable from any context. + + The corresponding functions exported to upper level protocol + consumers: + + ib_create_ah + ib_modify_ah + ib_query_ah + ib_destroy_ah + ib_bind_mw + ib_post_send + ib_post_recv + ib_req_notify_cq + ib_map_phys_fmr + + are therefore safe to call from any context. + + In addition, the function + + ib_dispatch_event + + used by low-level drivers to dispatch asynchronous events through + the midlayer is also safe to call from any context. + +Reentrancy + + All of the methods in struct ib_device exported by a low-level + driver must be fully reentrant. The low-level driver is required to + perform all synchronization necessary to maintain consistency, even + if multiple function calls using the same object are run + simultaneously. + + The IB midlayer does not perform any serialization of function calls. + + Because low-level drivers are reentrant, upper level protocol + consumers are not required to perform any serialization. However, + some serialization may be required to get sensible results. For + example, a consumer may safely call ib_poll_cq() on multiple CPUs + simultaneously. However, the ordering of the work completion + information between different calls of ib_poll_cq() is not defined. + +Callbacks + + A low-level driver must not perform a callback directly from the + same callchain as an ib_device method call. For example, it is not + allowed for a low-level driver to call a consumer's completion event + handler directly from its post_send method. Instead, the low-level + driver should defer this callback by, for example, scheduling a + tasklet to perform the callback. + + The low-level driver is responsible for ensuring that multiple + completion event handlers for the same CQ are not called + simultaneously. The driver must guarantee that only one CQ event + handler for a given CQ is running at a time. In other words, the + following situation is not allowed: + + CPU1 CPU2 + + low-level driver -> + consumer CQ event callback: + /* ... */ + ib_req_notify_cq(cq, ...); + low-level driver -> + /* ... */ consumer CQ event callback: + /* ... */ + return from CQ event handler + + The context in which completion event and asynchronous event + callbacks run is not defined. Depending on the low-level driver, it + may be process context, softirq context, or interrupt context. + Upper level protocol consumers may not sleep in a callback. + +Hot-plug + + A low-level driver announces that a device is ready for use by + consumers when it calls ib_register_device(), all initialization + must be complete before this call. The device must remain usable + until the driver's call to ib_unregister_device() has returned. + + A low-level driver must call ib_register_device() and + ib_unregister_device() from process context. It must not hold any + semaphores that could cause deadlock if a consumer calls back into + the driver across these calls. + + An upper level protocol consumer may begin using an IB device as + soon as the add method of its struct ib_client is called for that + device. A consumer must finish all cleanup and free all resources + relating to a device before returning from the remove method. + + A consumer is permitted to sleep in its add and remove methods. diff --git a/Documentation/infiniband/user_mad.txt b/Documentation/infiniband/user_mad.txt index cae0c83f1ee..750fe5e80eb 100644 --- a/Documentation/infiniband/user_mad.txt +++ b/Documentation/infiniband/user_mad.txt @@ -28,13 +28,37 @@ Creating MAD agents Receiving MADs - MADs are received using read(). The buffer passed to read() must be - large enough to hold at least one struct ib_user_mad. For example: - - struct ib_user_mad mad; - ret = read(fd, &mad, sizeof mad); - if (ret != sizeof mad) + MADs are received using read(). The receive side now supports + RMPP. The buffer passed to read() must be at least one + struct ib_user_mad + 256 bytes. For example: + + If the buffer passed is not large enough to hold the received + MAD (RMPP), the errno is set to ENOSPC and the length of the + buffer needed is set in mad.length. + + Example for normal MAD (non RMPP) reads: + struct ib_user_mad *mad; + mad = malloc(sizeof *mad + 256); + ret = read(fd, mad, sizeof *mad + 256); + if (ret != sizeof mad + 256) { + perror("read"); + free(mad); + } + + Example for RMPP reads: + struct ib_user_mad *mad; + mad = malloc(sizeof *mad + 256); + ret = read(fd, mad, sizeof *mad + 256); + if (ret == -ENOSPC)) { + length = mad.length; + free(mad); + mad = malloc(sizeof *mad + length); + ret = read(fd, mad, sizeof *mad + length); + } + if (ret < 0) { perror("read"); + free(mad); + } In addition to the actual MAD contents, the other struct ib_user_mad fields will be filled in with information on the received MAD. For @@ -50,18 +74,21 @@ Sending MADs MADs are sent using write(). The agent ID for sending should be filled into the id field of the MAD, the destination LID should be - filled into the lid field, and so on. For example: + filled into the lid field, and so on. The send side does support + RMPP so arbitrary length MAD can be sent. For example: + + struct ib_user_mad *mad; - struct ib_user_mad mad; + mad = malloc(sizeof *mad + mad_length); - /* fill in mad.data */ + /* fill in mad->data */ - mad.id = my_agent; /* req.id from agent registration */ - mad.lid = my_dest; /* in network byte order... */ + mad->hdr.id = my_agent; /* req.id from agent registration */ + mad->hdr.lid = my_dest; /* in network byte order... */ /* etc. */ - ret = write(fd, &mad, sizeof mad); - if (ret != sizeof mad) + ret = write(fd, &mad, sizeof *mad + mad_length); + if (ret != sizeof *mad + mad_length) perror("write"); Setting IsSM Capability Bit |