Tape - Multiple Drives Showing Status of Not functional

(Doc ID 1363076.1)

Last updated on SEPTEMBER 22, 2017

Applies to:

Sun StorageTek 9940 Tape Drive - Version Not Applicable and later
LTO Tape Drive - Version Not Applicable and later
Sun StorageTek T10000 Tape Drive - Version Not Applicable and later
Sun StorageTek 9840 Tape Drive - Version Not Applicable and later
Information in this document applies to any platform.

Symptoms

This instance involved HP LTO3 drives in an L180 library.
Various issues - the main one perceived by the customer was tapes being left in drives with FSC 3e24 recorded in the L180.

Here is the L180 FSC log.
The figure before the date is the number of these events since the library was last reset.
The date is that of the last event.

3e13 Warning DRIVE_02_00 2 9/27/2011 18:39:20 DRV Drive not responding
3e0a Warning DRIVE_02_00 1 9/27/2011 18:38:50 DRV TTI communication to drive failed
3e1c Warning DRIVE_09_00 1 9/27/2011 18:35:39 DRV drive reset detected
3e13 Warning DRIVE_01_00 3 9/27/2011 18:35:38 DRV Drive not responding
3e1c Warning DRIVE_00_00 1 9/27/2011 18:35:38 DRV drive reset detected
309e Warning NONE 3 9/27/2011 18:33:0 Cartridge access door is open. All robot activity in the library has been aborted.
3e24 Warning DRIVE_09_00 3 9/27/2011 18:25:31 DRV Wait for Rewind Failed - Loaded State reported during wait
3e24 Warning DRIVE_01_00 6 9/27/2011 18:25:5 DRV Wait for Rewind Failed - Loaded State reported during wait
3e24 Warning DRIVE_00_00 4 9/27/2011 18:23:23 DRV Wait for Rewind Failed - Loaded State reported during wait
3e24 Warning DRIVE_05_00 1 9/27/2011 18:8:12 DRV Wait for Rewind Failed - Loaded State reported during wait
3a32 Warning NONE 7 9/27/2011 17:41:24 OPI unable to get the Clean Warn Count Info From IFM.
3e1c Warning DRIVE_05_00 1 9/27/2011 17:6:59 DRV drive reset detected
3e1c Warning DRIVE_07_00 1 9/27/2011 17:6:58 DRV drive reset detected
3e1c Warning DRIVE_06_00 1 9/27/2011 17:6:58 DRV drive reset detected
3e13 Warning DRIVE_08_00 1 9/27/2011 17:6:57 DRV Drive not responding
3e14 Informational DRIVE_09_00 1 9/27/2011 17:6:50 DRV Drive not connected
3e14 Informational DRIVE_03_00 1 9/27/2011 17:4:39 DRV Drive not connected
3e14 Informational DRIVE_04_00 1 9/27/2011 17:4:38 DRV Drive not connected
3e14 Informational DRIVE_00_00 1 9/27/2011 17:4:34 DRV Drive not connected
3e14 Informational DRIVE_01_00 1 9/27/2011 17:4:33 DRV Drive not connected

FSC3e24 errors are seen -- but as usual they are a symptom not the problem.
FSC3e24 drive found to be not unloaded in time for a dismount command.
FSC3e24 on L180/700 is usually a server process died or connection was lost or a misconfiguration or something else got in the way of the life cycle of the mount.
It is not a hardware failure.

Also see is:
FSC3e13 Drive not responding - again a symptom,
and
FSC3e0a TTI communication to drive failed - another symptom.

Luckily we can see the cause of the mayhem right here on the L180 FSC log.
FSC 3e1c drive reset detected.
This will cause them to stop communication with the library for a while so you get other errors such as
3e13 Warning DRIVE_08_00 1 9/27/2011 17:6:57 DRV Drive not responding
3e14 Informational DRIVE_09_00 1 9/27/2011 17:6:50 DRV Drive not connected

The 3e24 Warning DRIVE_05_00 1 9/27/2011 18:8:12 DRV Wait for Rewind Failed - Loaded State reported during wait, are from events higher in the stack.

Drive resets are catastrophic.
Drive resets are attempts at recovery - not failures.
They are external - not from the drive or the library.
The drive must obey and reset itself regardless of what it is doing at the time.
Please do not replace library FRUs or drives.

But we know a reset like a timeout is not a failure.
It is a recovery action.

The customer was able to provide an explorer from a media server.
Again we see chaos caused by drives being reset.

 Drives are reporting via SCSI sense that they have been reset.

Sep 25 05:32:31 scsi: [ID 107833 kern.notice] ASC: 0x29 (bus device reset message occurred), ASCQ: 0x3, FRU: 0x0
Sep 25 06:12:36 scsi: [ID 107833 kern.notice] ASC: 0x29 (bus device reset message occurred), ASCQ: 0x3, FRU: 0x0
Sep 25 06:52:41 scsi: [ID 107833 kern.notice] ASC: 0x29 (bus device reset message occurred), ASCQ: 0x3, FRU: 0x0

Also the FC driver can see the drives being reset.
(This initiator did not do the resets).
.
Sep 23 23:30:07 FCP: WWN 0x500104f00059f3c9 reset successfully
Sep 23 23:33:42 FCP: WWN 0x500104f00059f3c9 reset successfully
Sep 23 23:33:43 FCP: WWN 0x500104f00059f3c9 reset successfully
Sep 23 23:37:06 FCP: WWN 0x500104f00059f3c9 reset successfully
Sep 23 23:40:26 FCP: WWN 0x500104f00059f3c9 reset successfully


Also there is a SAN configuration device allocation issues resulting reservation conflicts.

Sep 25 16:33:17 avrd[15582]: [ID 867986 daemon.notice] Reservation Conflict status from HP.ULTRIUM4-SCSI.009 (device 15)
Sep 26 04:01:16 avrd[15582]: [ID 205616 daemon.notice] Reservation Conflict status from HP.ULTRIUM4-SCSI.003 (device 3)

Also devices disappear from the fabric:
Sep 27 16:52:34 fctl: [ID 517869 kern.warning] WARNING: fp(0)::N_x Port with D_ID=1a0800, PWWN=500104f00059f3b7 disappeared from fabric
Sep 27 16:52:35 fctl: [ID 517869 kern.warning] WARNING: fp(6)::N_x Port with D_ID=190800, PWWN=500104f00059f3bd disappeared from fabric
Sep 27 16:52:35 fctl: [ID 517869 kern.warning] WARNING: fp(6)::N_x Port with D_ID=190900, PWWN=500104f00059f3ba disappeared from fabric
Sep 27 16:52:36 fctl: [ID 517869 kern.warning] WARNING: fp(6)::N_x Port with D_ID=190a00, PWWN=500104f00059f3c0 disappeared from fabric
Sep 27 16:52:36 fctl: [ID 517869 kern.warning] WARNING: fp(0)::N_x Port with D_ID=1c1100, PWWN=500104f00059f3c3 disappeared from fabric
Sep 27 16:52:53 fctl: [ID 517869 kern.warning] WARNING: fp(0)::N_x Port with D_ID=1a0800, PWWN=500104f00059f3b7 reappeared in fabric
Sep 27 16:52:56 fctl: [ID 517869 kern.warning] WARNING: fp(6)::N_x Port with D_ID=190900, PWWN=500104f00059f3ba reappeared in fabric
Sep 27 16:53:00 fctl: [ID 517869 kern.warning] WARNING: fp(6)::N_x Port with D_ID=190800, PWWN=500104f00059f3bd reappeared in fabric
Sep 27 16:53:04 fctl: [ID 517869 kern.warning] WARNING: fp(6)::N_x Port with D_ID=190a00, PWWN=500104f00059f3c0 reappeared in fabric
Sep 27 16:53:06 fctl: [ID 517869 kern.warning] WARNING: fp(0)::N_x Port with D_ID=1c1100, PWWN=500104f00059f3c3 reappeared in fabric

There are other unusual non hardware errors occurring here too.

Sep 27 11:12:22 scsi: [ID 107833 kern.warning] WARNING: /pci@8,600000/SUNW,emlxs@1/fp@0,0/st@w500104f0009cabff,0 (st5):
Sep 27 11:12:22 scsi: [ID 107833 kern.notice] ASC: 0x2a (mode parameters changed), ASCQ: 0x1, FRU: 0x0
Sep 27 15:05:00 scsi: [ID 107833 kern.warning] WARNING: /pci@8,600000/SUNW,emlxs@1,1/fp@0,0/st@w500104f00059f3bd,0 (st14):
Sep 27 15:05:00 scsi: [ID 107833 kern.notice] ASC: 0x28 (medium may have changed), ASCQ: 0x0, FRU: 0x0

(These are both SCSI key 02 events (Unit attention) not media or hardware.

Some of these also occur on disks.
Are disks zoned in with the tapes?
Sep 27 01:26:34 scsi: [ID 107833 kern.warning] WARNING: /scsi_vhci/ssd@g4849544143484920373730313336323430303333 (ssd32):
Sep 27 01:26:34 scsi: [ID 107833 kern.notice] ASC: 0x2a (parameters changed), ASCQ: 0x0, FRU: 0x0


In the explorer and the FSC log,  not seeing any tape drive or library hardware problem so please stop replacing hardware in the L180.


Cause

Sign In with your My Oracle Support account

Don't have a My Oracle Support account? Click to get started

My Oracle Support provides customers with access to over a
Million Knowledge Articles and hundreds of Community platforms