Age | Commit message (Collapse) | Author |
|
|
|
The oops is characteristic of the underlying device being removed from
visibility before the class device, and sure enough we do device_del()
before transport_unregister() in the scsi_target_reap() routines. I've
no idea why this is suddenly showing up, since the code has been in
there since that function was first invented. However, I've confirmed
this fixes Andrew Vasquez's boot oops.
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
scsi_reap_target() was desgined to be called from any context.
However it must do a device_del() of the target device, which may only
be called from user context. Thus we have to reimplement
scsi_reap_target() via a workqueue.
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
|
|
patch below marks a few scsi core datastructures as const, so that they end up
in the .rodata section and don't cacheline share with things that get dirtied
Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
|
|
There is a double free in the scsi scan code if a LLDD's slave_alloc()
call fails. There is a direct call to scsi_free_queue and then the
following put_device calls the release function, which also frees the
queue.
Remove the redundant scsi_free_queue.
Signed-off-by: Brian King <brking@us.ibm.com>
Tested-by: Nathan Lynch <ntl@pobox.com>
[ Also removed some strange whitespace artifacts in that area ]
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
|
|
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
|
|
rejections fixed and
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
|
|
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
|
|
This should eliminate (at least in the mid layer) to make numeric
assumptions about any of the enumeration variables. As a side effect,
it will also make all the messages consistent and line us up nicely for
the error logging strategy (if it ever shows itself again).
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
|
|
Currently we just ignore the device, which means there are a few
arrays out there that we don't find.
This patch updates the scsi_report_lun_scan() to take a target instead
of a device so it can be called on a return of
SCSI_SCAN_TARGET_PRESENT, which is what a PQ 3 device returns.
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
|
|
This patch (as545) fixes the list traversals in __scsi_remove_target and
scsi_forget_host. In each case the existing code list_for_each_entry_safe
in an _unsafe_ manner, because the list was not protected from outside
modification while the iteration was running.
The new scsi_forget_host routine takes the moderately controversial step
of iterating over devices for removal rather than iterating over targets.
This makes more sense to me because the current scheme treats targets as
second-class citizens, created and removed on demand, rather than as
objects corresponding to actual hardware. (Also I couldn't figure out any
safe way to iterate over the target list, since it's not so easy to tell
when a target has already been removed.)
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
|
|
The original API returned either an ERR_PTR() or a refcounted sdev.
Unfortunately, if it's successful, you need to do a scsi_device_put() on
the sdev otherwise the refcounting is wrong.
Everyone seems to expect that scsi_add_device() should be callable
without doing the ref put, so alter the API so it is (we still have
__scsi_add_device with the original behaviour).
The only actual caller that needs altering is the one in firewire ...
not because it gets this right, but because it acts on the error if one
is returned.
Acked-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
|
|
This patch (as546) fixes an oops-causing failure to check the return code
from scsi_device_get. The call can return an error if the LLD is being
unloaded from memory.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
|
|
This patch (as543) adds a private entry point to scsi_scan_target, for use
when the caller already owns the scan_mutex, and updates the kerneldoc for
that routine (which was badly out-of-date). It converts scsi_scan_channel
to use the new entry point. Lastly, it modifies scsi_get_host_dev to make
it acquire the scan_mutex, necessary since the routine adds a new
scsi_device even if it doesn't do any actual scanning.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
|
|
This one removes struct scsi_request entirely from sd. In the process,
I noticed we have no callers of scsi_wait_req who don't immediately
normalise the sense, so I updated the API to make it take a struct
scsi_sense_hdr instead of simply a big sense buffer.
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
|
|
|
|
Original From: Mike Christie <michaelc@cs.wisc.edu>
Add scsi_execute_req() as a replacement for scsi_wait_req()
Fixed up various pieces (added REQ_SPECIAL and caught req use after
free)
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
|
|
We have some nasty issues with 2.6.12-rc6. Any request to scan on
the lpfc or qla2xxx FC adapters will oops. What is happening is the
system is defaulting to non-transport registered targets, which
inherit the parent of the scan. On this second scan, performed by
the attribute, the parent becomes the shost instead of the rport.
The slave functions in the 2 FC adapters use starget_to_rport()
routines, which incorrectly map the shost as an rport pointer.
Additionally, this pointed out other weaknesses:
- If the target structure is torn down outside of the transport,
we have no method for it to be regenerated at the proper parent.
- We have race conditions on the target being allocated by both
the midlayer scan (parent=shost) and by the fc transport
(parent=rport).
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
|
|
Add support to not allow additions to a host when it is being removed.
Signed-off-by: Mike Anderson <andmike@us.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
|
|
Adds a missing check for an error return code from scsi_sysfs_add_sdev.
This resolves entry #4863 in the OSDL bugzilla. Although in that bug
report the failure occurred because of a confusion over scanning vs.
rescanning, in general add_sdev can fail for a number of reasons (the
simplest being insufficient memory) and the caller should cope properly.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Cc: James Bottomley <James.Bottomley@SteelEye.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
One of the issues we had was reverting the midlayers lun value
into the 8byte lun value that we wanted to send to the device.
Historically, there's been some combination of byte swapping,
setting high/low, etc. There's also been no common thread between
how our driver did it and others. I also got very confused as
to why byteswap routines were being used.
Anyway, this patch is a LLDD-callable function that reverts the
midlayer's lun value, stored in an int, to the 8-byte quantity
(note: this is not the real 8byte quantity, just the same amount
that scsilun_to_int() was able to convert and store originally).
This also solves the dilemma of the thread:
http://marc.theaimsgroup.com/?l=linux-kernel&m=112116767118981&w=2
A patch for the lpfc driver to use this function will be along
in a few days (batched with other patches).
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
|
|
|
|
With CONFIG_DEBUG_SLAB=y I see slab corruption messages during boot on
pSeries machines with IPR adapters with any 2.6.12-rc kernel.
The change which seems to have introduced the problem is "SCSI: revamp
target scanning routines" and may be found at:
http://marc.theaimsgroup.com/?l=bk-commits-head&m=111093946426333&w=2
In order to revert that in a 2.6.12-rc1 tree, I had to revert "target
code updates to support scanned targets" first:
http://marc.theaimsgroup.com/?l=bk-commits-head&m=111094132524649&w=2
With both patches reverted, the corruption messages go away.
ipr: IBM Power RAID SCSI Device Driver version: 2.0.13 (February 21,
2005)
ipr 0001:d0:01.0: Found IOA with IRQ: 167
ipr 0001:d0:01.0: Starting IOA initialization sequence.
ipr 0001:d0:01.0: Adapter firmware version: 020A005C
ipr 0001:d0:01.0: IOA initialized.
scsi0 : IBM 570B Storage Adapter
Vendor: IBM Model: VSBPD4E1 U4SCSI Rev: 4770
Type: Enclosure ANSI SCSI revision: 02
Vendor: IBM H0 Model: HUS103036FL3800 Rev: RPQF
Type: Direct-Access ANSI SCSI revision: 04
Vendor: IBM H0 Model: HUS103036FL3800 Rev: RPQF
Type: Direct-Access ANSI SCSI revision: 04
Vendor: IBM H0 Model: HUS103036FL3800 Rev: RPQF
Type: Direct-Access ANSI SCSI revision: 04
Vendor: IBM H0 Model: HUS103036FL3800 Rev: RPQF
Type: Direct-Access ANSI SCSI revision: 04
Vendor: IBM Model: VSBPD4E1 U4SCSI Rev: 4770
Type: Enclosure ANSI SCSI revision: 02
Slab corruption: start=c0000001e8de5268, len=512
Redzone: 0x5a2cf071/0x5a2cf071.
Last user: [<c00000000029c3a0>](.scsi_target_dev_release+0x28/0x50)
080: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6a
Prev obj: start=c0000001e8de5050, len=512
Redzone: 0x5a2cf071/0x5a2cf071.
Last user: [<0000000000000000>](0x0)
000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
Next obj: start=c0000001e8de5480, len=512
Redzone: 0x170fc2a5/0x170fc2a5.
Last user: [<c000000000228d7c>](.as_init_queue+0x5c/0x228)
000: c0 00 00 01 e8 83 26 08 00 00 00 00 00 00 00 00
010: 00 00 00 00 00 00 00 00 c0 00 00 01 e8 de 54 98
Slab corruption: start=c0000001e8de5268, len=512
Redzone: 0x5a2cf071/0x5a2cf071.
Last user: [<c00000000029c3a0>](.scsi_target_dev_release+0x28/0x50)
080: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6a
Prev obj: start=c0000001e8de5050, len=512
Redzone: 0x5a2cf071/0x5a2cf071.
Last user: [<0000000000000000>](0x0)
000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
Next obj: start=c0000001e8de5480, len=512
Redzone: 0x170fc2a5/0x170fc2a5.
Last user: [<c000000000228d7c>](.as_init_queue+0x5c/0x228)
000: c0 00 00 01 e8 83 26 08 00 00 00 00 00 00 00 00
010: 00 00 00 00 00 00 00 00 c0 00 00 01 e8 de 54 98
...
I did some digging and the problem seems to be a refcounting issue in
__scsi_add_device. The target gets freed in scsi_target_reap, and
then __scsi_add_device tries to do another device_put on it.
Signed-off-by: Nathan Lynch <ntl@pobox.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
|
|
This gives the HBA driver notice when a target is created and
destroyed to allow it to manage its own target based allocations
accordingly.
This is a much reduced verson of the original patch sent in by
James.Smart@Emulex.com
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
|
|
a) TYPE_SDAD renamed to TYPE_RBC and taken to scsi.h
b) in sbp2.c remapping of TYPE_RPB to TYPE_DISK turned off
c) relevant places in midlayer and sd.c taught to accept TYPE_RBC
d) sd.c::sd_read_cache_type() looks into page 6 when dealing with
TYPE_RBC - these guys have writeback cache flag there and are not guaranteed
to have page 8 at all.
e) sd_read_cache_type() got an extra sanity check - it checks that
it got the page it asked for before using its contents. And screams if
mismatch had happened. Rationale: there are broken devices out there that
are "helpful" enough to go for "I don't have a page you've asked for, here,
have another one". For example, PL3507 had been caught doing just that...
f) sbp2 sets sdev->use_10_for_rw and sdev->use_10_for_ms instead
of bothering to remap READ6/WRITE6/MOD_SENSE, so most of the conversions
in there are gone now.
Incidentally, I wonder if USB storage devices that have no
mode page 8 are simply RBC ones. I haven't touched that, but it might
be interesting to check...
Signed-off-by: Al Viro <viro@parcelfarce.linux.theplanet.co.uk>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
|
|
Somebody forgot that | has higher priority than ?:. As the result,
allocation is done with bogus flags - instead of GFP_ATOMIC + possibly
GFP_DMA we always get GFP_DMA and no GFP_ATOMIC.
Signed-off-by: Al Viro <viro@parcelfarce.linux.theplanet.co.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
The current problem seen is that the queue lock is actually in the
SCSI device structure, so when that structure is freed on device
release, we go boom if the queue tries to access the lock again.
The fix here is to move the lock from the scsi_device to the queue.
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
|
|
Initial git repository build. I'm not bothering with the full history,
even though we have it. We can create a separate "historical" git
archive of that later if we want to, and in the meantime it's about
3.2GB when imported into git - space that would just make the early
git days unnecessarily complicated, when we don't have a lot of good
infrastructure for it.
Let it rip!
|