aboutsummaryrefslogtreecommitdiff
path: root/drivers/firewire
AgeCommit message (Collapse)Author
2008-04-18firewire: reread config ROM when device reset the busStefan Richter
When a device changes its configuration ROM, it announces this with a bus reset. firewire-core has to check which node initiated a bus reset and whether any unit directories went away or were added on this node. Tested with an IOI FWB-IDE01AB which has its link-on bit set if bus power is available but does not respond to ROM read requests if self power is off. This implements - recognition of the units if self power is switched on after fw-core gave up the initial attempt to read the config ROM, - shutdown of the units when self power is switched off. Also tested with a second PC running Linux/ieee1394. When the eth1394 driver is inserted and removed on that node, fw-core now notices the addition and removal of the IPv4 unit on the ieee1394 node. Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
2008-04-18firewire: replace static ROM cache by allocated cacheStefan Richter
read_bus_info_block() is repeatedly called by workqueue jobs. These will step on each others toes eventually if there are multiple workqueue threads, and we end up with corrupt config ROM images. Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
2008-04-18firewire: fw-ohci: work around generation bug in TI controllers (fix AV/C ↵Stefan Richter
and more) Unlike the ohci1394 driver, fw-ohci uses the selfIDGeneration field of bus reset packets to determine the generation of incoming requests as per OHCI 1.1 clause 8.4.2.3. This is more precise --- provided that the controller inserts the correct generation. Texas Instruments chips often don't. This prevented the transmission of response packets, which for example broke AV/C transactions as used when communicating with miniDV cameras and any other AV/C devices. There is apparently no way to detect and adjust incorrect generations. Therefore we ignore the generation of bus reset packets from TI chips and use the generation of the self ID buffer instead. Alas this is received at a slightly wrong time. In rare cases, this could cause us to not respond to legitimate requests or to respond to expired requests. (The latter is less likely because the bus reset packet AR event is typically handled before the self ID complete event.) Bug reported by Mladen Kuntner, who was extraordinarily patient while dealing with the driver maintainers. Fix confirmed to be required and effective for TSB82AA2 and a TSB43AB22 or TSB43AB22A. https://bugzilla.redhat.com/show_bug.cgi?id=243081 Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de> Signed-off-by: Jarod Wilson <jwilson@redhat.com>
2008-04-18firewire: fw-ohci: extend logging of bus generations and node IDStefan Richter
Extend the logging of "AR evt_bus_reset, link internal" to "AR evt_bus_reset, generation ${selfIDGeneration}". That way we can check whether this generation matches the one seen in self ID complete event logging. See OHCI 1.1 clause 8.4.2.3. Also extend logging of "firewire_ohci: * selfIDs, generation *" by "local node ID ffc*" in self ID logging to make the local node in AT/AR event logs more obvious. Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de> Signed-off-by: Jarod Wilson <jwilson@redhat.com>
2008-04-18firewire: fw-ohci: conditionally log busReset interruptsStefan Richter
Add a debug option to watch bus reset interrupt events. Half of this patch is taken from Jarod Wilson's first version of the JMicron fix. BusReset interrupts are only generated if the respective module parameter flag was set before the controller is being initialized. Else we keep this event masked to reduce IRQ load in normal operation and to avoid potential problems with buggy chips. Note, this is unlike the other IRQ events whose logging can be enabled any time after chip initialization. This and the influence on what interrupts the chip generates is why I added an extra flag for it. Also, reorder the debug parameter flags according to their perceived usefulness. Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de> Signed-off-by: Jarod Wilson <jwilson@redhat.com>
2008-04-18firewire: fw-ohci: don't append to AT context when it's not activeJarod Wilson
I finally tracked down the issues with this JMicron PCI-e card in my possession to a failure to comply with section 7.2.3.2 of the OHCI 1.1 specification (thanks to Kristian for the pointer to illustrate that it is indeed a flaw in this card, not the driver). The controller should simply flush the packets we've appended to its AT queue if a bus reset occurs before they've been transmitted and we'll try again, but something goes wrong and the controller winds up hung. However, we can avoid the problem by simply checking if the IntEvent.busReset register had been set before we try appending to the AT context. When busReset is set, the AT context is completely halted until busReset is cleared, so there's no point in appending AT packets until the register is cleared. So at_context_queue_packet() now checks for busReset being set, and bails with an RCODE_GENERATION packet ack, which results in us trying to append the packet again after recognizing the fact there has been a bus reset, and clearing busReset. Signed-off-by: Jarod Wilson <jwilson@redhat.com> Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
2008-04-18firewire: fw-ohci: log regAccessFail eventsJarod Wilson
While trying to debug this piece of crap JMicron PCI-e controller in my possession, one thought was that perhaps I was encountering register access failures. I'm not, but logging them would be good, so we can see if they are a real problem we should be taking into account anywhere in the code. Signed-off-by: Jarod Wilson <jwilson@redhat.com> Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de> (added list contact)
2008-04-18firewire: fw-ohci: make sure HCControl register LPS bit is setJarod Wilson
I've now witnessed multiple occasions where one of my controllers (a very poorly working JMicron PCIe card) fails to get its registers properly set up in ohci_enable(), apparently due to an occasionally very slow to initiate SClk. The easy fix for this problem is to add a tiny while loop to try again a time or three after initially enabling LPS before we move on (or give up). Of course, the card still isn't fully functional yet, but this gets it at least one tiny step closer... Signed-off-by: Jarod Wilson <jwilson@redhat.com> Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
2008-04-18firewire: fw-ohci: missing PPC PMac feature calls in failure pathStefan Richter
Balance ohci_pmac_on and ohci_pmac_off if pci_driver.probe fails. Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
2008-04-18firewire: fw-ohci: untangle a mixed unsigned/signed expressionStefan Richter
and make another expression more readable. Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
2008-04-18firewire: debug interrupt eventsStefan Richter
This adds debug printks for asynchronous transmission and reception and for self ID reception. They can be enabled at module load time, and at runtime via /sys/module/firewire_ohci/parameters/debug. Signed-off-by: Jarod Wilson <jwilson@redhat.com> Also added: Logging of interrupt event codes and of cancelled AT packets. The code now depends on a Kconfig variable. This makes it easier to build firewire-ohci without the feature or to make it an option in the future. The variable is currently hidden and always on. This feature inflates firewire-ohci.ko by 7 kB = 27% on x86-64 and by 4 kB = 23% on i686. Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
2008-04-18firewire: fw-ohci: catch self_id_count == 0Stefan Richter
fw_core_handle_bus_reset() incorrectly relied on the assumption that self_id_count > 0. We check early in fw-ohci and discard the self ID complete event if self_id_count == 0 because a valid event always has at least one self ID packet in it (the one of the local node). Hence treat self_id_count == 0 like any other kind of invalid self ID buffer. Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de> Signed-off-by: Jarod Wilson <jwilson@redhat.com>
2008-04-18firewire: fw-ohci: add self ID error checkStefan Richter
Discard self ID buffer contents if - the selfIDError flag is set, - any of the self ID packets has bit errors. Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de> Signed-off-by: Jarod Wilson <jwilson@redhat.com>
2008-04-18firewire: fw-ohci: refactor probe, remove, suspend, resumeStefan Richter
Clean up shared code and variable names. Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
2008-04-18firewire: fw-ohci: switch on bus power after resume on PPC PMacStefan Richter
The platform feature calls in the suspend method switched off cable power, but the calls in the resume method did not switch it back on. Add the necessary feature call to .resume. Also add the corresponding call to .suspend to make .suspend's behavior explicitly the same on all PMacs. Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
2008-04-18firewire: fw-ohci: add option for remote debuggingStefan Richter
This way firewire-ohci can be used for remote debugging like ohci1394. Version with amendment from Fri, 11 Apr 2008 00:08:08 +0200. Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de> Acked-by: Bernhard Kaindl <bk@suse.de>
2008-04-18firewire: fw-sbp2: set dual-phase cycle_limitJarod Wilson
Try to write dual-phase retry protocol limits to BUSY_TIMEOUT register. - The dual-phase retry protocol is optional to implement, and if not supported, writes to the dual-phase portion of the register will be ignored. We try to write the original 1394-1995 default here. - In the case of devices that are also SBP-3-compliant, all writes are ignored, as the register is read-only, but contains single-phase retry of 15, which is what we're trying to set for all SBP-2 device anyway, so this write attempt is safe and yields more consistent behavior for all devices. See section 8.3.2.3.5 of the 1394-1995 spec, section 6.2 of the SBP-2 spec, and section 6.4 of the SBP-3 spec for further details. Signed-off-by: Jarod Wilson <jwilson@redhat.com> Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
2008-04-18firewire: fw-sbp2: reduce log noiseStefan Richter
The block/unblock logic is now sufficiently tested. Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
2008-04-18firewire: fw-sbp2: remove unnecessary memsetStefan Richter
orb came from kzalloc. Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
2008-04-18firewire: fw-sbp2: simplify some macrosStefan Richter
How hard can it be to switch on one bit? :-) Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
2008-04-18firewire: fw-sbp2: remove usages of fw_memcpy_to_be32Stefan Richter
Write directly in big endian instead of byte-swapping after the fact. This saves a few conversions, lets gcc use constant endianess conversions where possible, and enables deeper endianess annotation. Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
2008-04-18firewire: fw-sbp2: relax SCSI DMA alignmentStefan Richter
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
2008-04-18firewire: refactor fw_unit reference countingStefan Richter
Add wrappers for getting and putting a unit. Remove some line breaks. Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de> Signed-off-by: Jarod Wilson <jwilson@redhat.com>
2008-04-18firewire: fw-sbp2: fix reference countingStefan Richter
The reference count of the unit dropped too low in an error path in sbp2_probe. Fixed by moving the _get further up. Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de> Signed-off-by: Jarod Wilson <jwilson@redhat.com>
2008-04-18firewire: remove superfluous reference countingStefan Richter
The card->kref became obsolete since patch "firewire: fix crash in automatic module unloading" added another counter of card users. Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de> Signed-off-by: Jarod Wilson <jwilson@redhat.com>
2008-03-27firewire: fw-ohci: plug dma memory leak in AR handlerJarod Wilson
There's an ugly little memory leak in firewire-ohci's ar_context_tasklet(), where we're not freeing up some of the memory we use for each ar_buffer, due to a moving pointer. The problem has been there for a while, but didn't get noticed until after converting the AR routines over to use coherent DMA and I started running into I/O stall- outs with the following message output repeatedly to the console: PCI-DMA: Out of IOMMU space for 53248 bytes at device 0000:04:09.0 Plugging this leak is definitely necessary, but unfortunately, isn't the entire answer to my problem, it only increases the amount of I/O that I can do before hitting the problem. Still working on tracking down the root cause.. Signed-off-by: Jarod Wilson <jwilson@redhat.com> Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
2008-03-20firewire: fix panic in handle_at_packetStefan Richter
This fixes a use-after-free bug in the handling of split transactions. The AT DMA handler of the request was occasionally executed after the AR DMA handler of the response. The AT DMA handler then accessed an already freed packet. Reported by Johannes Berg. http://bugzilla.kernel.org/show_bug.cgi?id=9617 Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de> Tested-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Jarod Wilson <jwilson@redhat.com>
2008-03-14firewire: fw-ohci: shut up false compiler warning on PPC32Stefan Richter
Shut up "may be used uninitialised in this function" warnings due to PPC32's implementation of dma_alloc_coherent(). Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
2008-03-14firewire: fw-ohci: use dma_alloc_coherent for ar_bufferJarod Wilson
Currently, we do nothing to guarantee we have a consistent DMA buffer for asynchronous receive packets. Rather than doing several sync's following a dma_map_single() to get consistent buffers, just switch to using dma_alloc_coherent(). Resolves constant buffer failures on my own x86_64 laptop w/4GB of RAM and likely to fix a number of other failures witnessed on x86_64 systems with 4GB of RAM or more. Signed-off-by: Jarod Wilson <jwilson@redhat.com> Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
2008-03-14firewire: fw-sbp2: fix for SYM13FW500 bridge (Datafab disk)Stefan Richter
Fix I/O errors due to SYM13FW500's inability to handle larger request sizes. Reported by Piergiorgio Sartor <piergiorgio.sartor@nexgo.de> in https://bugzilla.redhat.com/show_bug.cgi?id=436879 Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de> Signed-off-by: Jarod Wilson <jwilson@redhat.com>
2008-03-14firewire: update Kconfig help textStefan Richter
Remove some less necessary information, point out that video1394 and dv1394 should be blacklisted along with ohci1394. Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
2008-03-14firewire: warn on fatal condition in topology codeStefan Richter
If this ever happens to anybody, we want to have it in his log. Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
2008-03-14firewire: fw-sbp2: set single-phase retry_limitJarod Wilson
Per the SBP-2 specification, all SBP-2 target devices must have a BUSY_TIMEOUT register. Per the 1394-1995 specification, the retry_limt portion of the register should be set to 0x0 initially, and set on the target by a logged in initiator (i.e., a Linux host w/firewire controller(s)). Well, as it turns out, lots of devices these days have actually moved on to starting to implement SBP-3 compliance, which says that retry_limit should default to 0xf instead (yes, SBP-3 stomps directly on 1394-1995, oops). Prior to this change, the firewire driver stack didn't touch retry_limit, and any SBP-3 compliant device worked fine, while SBP-2 compliant ones were unable to retransmit when the host returned an ack_busy_X, which resulted in stalled out I/O, eventually causing the SCSI layer to give up and offline the device. The simple fix is for us to set retry_limit to 0xf in the register for all devices (which actually matches what the old ieee1394 stack did). Prior to this change, a hard disk behind an SBP-2 Prolific PL-3507 bridge chip would routinely encounter buffer I/O errors and wind up offlined by the SCSI layer. With this change, I've encountered zero I/O failures moving tens of GB of data around. Signed-off-by: Jarod Wilson <jwilson@redhat.com> Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
2008-03-14firewire: fw-ohci: Apple UniNorth 1st generation supportStefan Richter
Mostly copied from ohci1394.c. Necessary for some older Macs, e.g. PowerBook G3 Pismo and early PowerBook G4 Titanium. Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
2008-03-14firewire: fw-ohci: PPC PMac platform codeStefan Richter
Copied from ohci1394.c. This code is necessary to prevent machine check exceptions when reloading or resuming the driver. Tested on a 1st generation PowerBook G4 Titanium, which also needs the pci_probe() hunk. Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de> I was able to reproduce the system exception on resume with a 3rd-gen Titanium PowerBook G4 667, and this patch does let the system resume successfully now. Not quite clear if there was possibly an updated version coming using pci_enable_device() instead of the pair of pmac_call_feature() calls, but either way, this is a definite must-have, at least for older ppc macs -- my Aluminum PowerBook G4/1.67 suspends and resumes without this patch just fine. Signed-off-by: Jarod Wilson <jwilson@redhat.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2008-03-14firewire: endianess annotationsStefan Richter
Kills warnings from 'make C=1 CHECKFLAGS="-D__CHECK_ENDIAN__" modules': drivers/firewire/fw-transaction.c:771:10: warning: incorrect type in assignment (different base types) drivers/firewire/fw-transaction.c:771:10: expected unsigned int [unsigned] [usertype] <noident> drivers/firewire/fw-transaction.c:771:10: got restricted unsigned int [usertype] <noident> drivers/firewire/fw-transaction.h:93:10: warning: incorrect type in assignment (different base types) drivers/firewire/fw-transaction.h:93:10: expected unsigned int [unsigned] [usertype] <noident> drivers/firewire/fw-transaction.h:93:10: got restricted unsigned int [usertype] <noident> drivers/firewire/fw-ohci.c:1490:8: warning: restricted degrades to integer drivers/firewire/fw-ohci.c:1490:35: warning: restricted degrades to integer drivers/firewire/fw-ohci.c:1516:5: warning: cast to restricted type Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de> Signed-off-by: Jarod Wilson <jwilson@redhat.com>
2008-03-14firewire: endianess fixStefan Richter
The generation of incoming requests was filled in in wrong byte order on machines with big endian CPU. Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de> Signed-off-by: Jarod Wilson <jwilson@redhat.com>
2008-03-02firewire: fix crash in automatic module unloadingStefan Richter
"modprobe firewire-ohci; sleep .1; modprobe -r firewire-ohci" used to result in crashes like this: BUG: unable to handle kernel paging request at ffffffff8807b455 IP: [<ffffffff8807b455>] PGD 203067 PUD 207063 PMD 7c170067 PTE 0 Oops: 0010 [1] PREEMPT SMP CPU 0 Modules linked in: i915 drm cpufreq_ondemand acpi_cpufreq freq_table applesmc input_polldev led_class coretemp hwmon eeprom snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss button thermal processor sg snd_hda_intel snd_pcm snd_timer snd snd_page_alloc sky2 i2c_i801 rtc [last unloaded: crc_itu_t] Pid: 9, comm: events/0 Not tainted 2.6.25-rc2 #3 RIP: 0010:[<ffffffff8807b455>] [<ffffffff8807b455>] RSP: 0018:ffff81007dcdde88 EFLAGS: 00010246 RAX: ffff81007dc95040 RBX: ffff81007dee5390 RCX: 0000000000005e13 RDX: 0000000000008c8b RSI: 0000000000000001 RDI: ffff81007dee5388 RBP: ffff81007dc5eb40 R08: 0000000000000002 R09: ffffffff8022d05c R10: ffffffff8023b34c R11: ffffffff8041a353 R12: ffff81007dee5388 R13: ffffffff8807b455 R14: ffffffff80593bc0 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffffffff8055a000(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: ffffffff8807b455 CR3: 0000000000201000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process events/0 (pid: 9, threadinfo ffff81007dcdc000, task ffff81007dc95040) Stack: ffffffff8023b396 ffffffff88082524 0000000000000000 ffffffff8807d9ae ffff81007dc5eb40 ffff81007dc9dce0 ffff81007dc5eb40 ffff81007dc5eb80 ffff81007dc9dce0 ffffffffffffffff ffffffff8023be87 0000000000000000 Call Trace: [<ffffffff8023b396>] ? run_workqueue+0xdf/0x1df [<ffffffff8023be87>] ? worker_thread+0xd8/0xe3 [<ffffffff8023e917>] ? autoremove_wake_function+0x0/0x2e [<ffffffff8023bdaf>] ? worker_thread+0x0/0xe3 [<ffffffff8023e813>] ? kthread+0x47/0x74 [<ffffffff804198e0>] ? trace_hardirqs_on_thunk+0x35/0x3a [<ffffffff8020c008>] ? child_rip+0xa/0x12 [<ffffffff8020b6e3>] ? restore_args+0x0/0x3d [<ffffffff8023e68a>] ? kthreadd+0x14c/0x171 [<ffffffff8023e68a>] ? kthreadd+0x14c/0x171 [<ffffffff8023e7cc>] ? kthread+0x0/0x74 [<ffffffff8020bffe>] ? child_rip+0x0/0x12 Code: Bad RIP value. RIP [<ffffffff8807b455>] RSP <ffff81007dcdde88> CR2: ffffffff8807b455 ---[ end trace c7366c6657fe5bed ]--- Note that this crash happened _after_ firewire-core was unloaded. The shared workqueue tried to run firewire-core's device initialization jobs or similar jobs. The fix makes sure that firewire-ohci and hence firewire-core is not unloaded before all device shutdown jobs have been completed. This is determined by the count of device initializations minus device releases. Also skip useless retries in the node initialization job if the node is to be shut down. Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de> Signed-off-by: Jarod Wilson <jwilson@redhat.com>
2008-03-02firewire: potentially invalid pointers used in fw_card_bm_workStefan Richter
The bus management workqueue job was in danger to dereference NULL pointers. Also, after having temporarily lifted card->lock, a few node pointers and a device pointer may have become invalid. Add NULL pointer checks and get the necessary references. Also, move card->local_node out of fw_card_bm_work's sight during shutdown of the card. Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de> Signed-off-by: Jarod Wilson <jwilson@redhat.com>
2008-03-02firewire: fw-sbp2: better fix for NULL pointer dereference in scsi_remove_deviceStefan Richter
Patch "firewire: fw-sbp2: fix NULL pointer deref. in scsi_remove_device" had the unintended effect that firewire-sbp2 could not be unloaded anymore until all SBP-2 devices were unplugged. We now fix the NULL pointer bug by reacquiring a reference to the sdev instead of holding a reference to the sdev (and to the module) all the time. Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de> Tested-by: Jarod Wilson <jwilson@redhat.com>
2008-02-21firewire: fix NULL pointer deref. and resource leakStefan Richter
By supplying ioctl()s in the wrong order, a userspace client was able to trigger NULL pointer dereferences. Furthermore, by calling ioctl_create_iso_context more than once, new contexts could be created without ever freeing the previously created contexts. Thanks to Anders Blomdell for the report. Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
2008-02-19firewire: fw-sbp2: fix NULL pointer deref. in scsi_remove_deviceStefan Richter
Fix a kernel bug when unplugging an SBP-2 device after having its scsi_device already removed via the "delete" sysfs attribute. Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
2008-02-19firewire: fw-sbp2: fix NULL pointer deref. in slave_allocStefan Richter
Fix a kernel bug when running rescan-scsi-bus while a FireWire disk is connected: http://bugzilla.kernel.org/show_bug.cgi?id=10008 Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
2008-02-19firewire: fw-sbp2: (try to) avoid I/O errors during reconnectStefan Richter
While fw-sbp2 takes the necessary time to reconnect to a logical unit after bus reset, the SCSI core keeps sending new commands. They are all immediately completed with host busy status, and application clients or filesystems will break quickly. The SCSI device might even be taken offline: http://bugzilla.kernel.org/show_bug.cgi?id=9734 The only remedy seems to be to block the SCSI device until reconnect. Alas the SCSI core has no useful API to block only one logical unit i.e. the scsi_device, therefore we block the entire Scsi_Host. This currently corresponds to an SBP-2 target. In case of targets with multiple logical units, we need to satisfy the dependencies between logical units by carefully tracking the blocking state of the target and its units. We block all logical units of a target as soon as one of them needs to be blocked, and keep them blocked until all of them are ready to be unblocked. Furthermore, as the history of the old sbp2 driver has shown, the scsi_block_requests() API is a minefield with high potential of deadlocks. We therefore take extra measures to keep logical units unblocked during __scsi_add_device() and during shutdown. This avoids I/O errors during reconnect in many but alas not in all cases. There may still be errors after a re-login had to be performed. Also, some bridges have been seen to cease fetching management ORBs if I/O went on up until a bus reset. In these cases, all management ORBs time out after mgt_orb_timeout. The old sbp2 driver is less vulnerable or maybe not vulnerable to this, for as yet unknown reasons. Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
2008-02-16firewire: fw-sbp2: enforce a retry of __scsi_add_device if bus generation ↵Stefan Richter
changed fw-sbp2 is unable to reconnect while performing __scsi_add_device because there is only a single workqueue thread context available for both at the moment. This should be fixed eventually. An actual failure of __scsi_add_device is easy to handle, but an incomplete execution of __scsi_add_device with an sdev returned would remain undetected and leave the SBP-2 target unusable. Therefore we use a workaround: If there was a bus reset during __scsi_add_device (i.e. during the SCSI probe), we remove the new sdev immediately, log out, and attempt login and SCSI probe again. Tested-by: Jarod Wilson <jwilson@redhat.com> (earlier version) Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
2008-02-16firewire: fw-sbp2: sort includesStefan Richter
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
2008-02-16firewire: fw-sbp2: logout and login after failed reconnectStefan Richter
If fw-sbp2 was too late with requesting the reconnect, the target would reject this. In this case, log out before attempting the reconnect. Else several firmwares will deny the re-login because they somehow didn't invalidate the old login. Also, don't retry reconnects in this situation. The retries won't succeed either. These changes improve chances for successful re-login and shorten the period during which the logical unit is inaccessible. Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de> Signed-off-by: Jarod Wilson <jwilson@redhat.com>
2008-02-16firewire: fw-sbp2: don't add scsi_device twiceStefan Richter
When a reconnect failed but re-login succeeded, __scsi_add_device was called again. In those cases, __scsi_add_device succeeded and returned the pointer to the existing scsi_device. fw-sbp2 then continued orderly, except that it missed to call sbp2_cancel_orbs. SCSI core would call fw-sbp2's eh_abort_handler eventually if there had been an outstanding command. This patch avoids the needless lookups and temporary allocations in SCSI core and I/O stall and timeout until eh_abort_handler hits. Also, __scsi_add_device tolerating calls for devices which already exist is undocumented behavior on which we shouldn't rely. Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de> Signed-off-by: Jarod Wilson <jwilson@redhat.com>
2008-02-16firewire: fw-sbp2: log bus_id at management request failuresStefan Richter
for easier readable logs if more than one SBP-2 device is present. Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de> Signed-off-by: Jarod Wilson <jwilson@redhat.com>
2008-02-16firewire: fw-sbp2: wait for completion of fetch agent resetStefan Richter
Like the old sbp2 driver, wait for the write transaction to the AGENT_RESET to complete before proceeding (after login, after reconnect, or in SCSI error handling). There is one occasion where AGENT_RESET is written to from atomic context when getting DEAD status for a command ORB. There we still continue without waiting for the transaction to complete because this is more difficult to fix... Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>