ACSLS - Encryption-enabled drives show "in Use" and volsers show up only after backup has failed or timed out (Doc ID 1579696.1)

Last updated on NOVEMBER 17, 2016

Applies to:

Sun StorageTek Auto Cartridge Sys Lib SW (ACSLS) - Version 7.1 to 8.2 [Release 7.0 to 8.0]
Oracle Key Manager - Version 2.2 to 2.5.2 [Release 2.0]
Information in this document applies to any platform.

Symptoms

ACSLS 'query drive all' shows drives "in use" but without any volser for a very long time.
When the volser is finally displayed, the backup job either has failed or has timed out.

The drives are encrypted-enabled drives and enrolled with the Oracle Key Management software.

Netbackup shows status 85 Input/Output errors during Mount operation:
---------------------------------
Note:  The mount process takes a long time to complete. When the tape is finally loaded, the backup application gets an I/O error and eventually aborts the job.


08/23/2013 18:30:00 - Info bptm (pid=1977) media id D10212 mounted on drive index 8, drivepath /dev/rmt/7cbn, drivename STK.T10000C.001, copy 2
08/23/2013 18:33:02 - Error bptm (pid=1977) read error on media id D10212, drive index 8, reading header block, I/O error
08/23/2013 18:33:03 - Info bptm (pid=1977) EXITING with status 85 <----------

08/23/2013 18:26:20 - Info bptm (pid=20259) media id D10426 mounted on drive index 11, drivepath /dev/nst1, drivename STK.T10000C.000, copy 2
08/23/2013 18:29:22 - Error bptm (pid=20259) read error on media id D10426, drive index 11, reading header block, Input/output error
08/23/2013 18:29:23 - Info bptm (pid=20259) EXITING with status 85 <----------

08/23/2013 18:21:08 - Waiting for scan drive stop StkT10kd6, Media server: mserver
08/23/2013 18:21:10 - granted resource DC7562
08/23/2013 18:21:10 - granted resource StkT10kd6
08/23/2013 18:21:10 - granted resource mserver-hcart3-robot-acs-4
08/23/2013 18:37:30 - Error bpduplicate (pid=1197) host mserver backup id stanli_1377153409 read failed, media manager killed by signal (82).
08/23/2013 18:37:30 - Error bpduplicate (pid=1197) host mserver backupid stanli_1377153409 write failed, media read error (85).
08/23/2013 18:37:31 - Error bpduplicate (pid=1197) Duplicate of backupid stanli_1377153409 failed, media read error (85).
08/23/2013 18:37:31 - Error bpduplicate (pid=1197) Status = no images were successfully processed.


The library error warn info event log shows "Drive is loaded" and "Drive cartridge is present" during Load time:
--------------------------
2013-08-23T17:23:09.667 error 1.1.-1.1.1 3955 "Error from device Code: 601 - Drive is loaded", Data=rewindUnloaderror6441601 ,driveType=T10000c-Enc,driveFirmware=RL53.315-5.30 Rc:6441 move root hli0 16601
2013-08-23T17:23:13.637 error 1.3.-1.1.1 3955 "Error from device Code: 601 - Drive is loaded", Data=rewindUnloaderror6441601 ,driveType=T10000c-Enc,driveFirmware=RL53.315-5.30 Rc:6441 move root hli0 16801

2013-08-23T17:35:41.593 error 1.1.-1.1.1 3955 "Error from device Code: 603 - Drive cartridge is present", Data=loaderror6440603000000 D10329T2(driveType=T10000c-Enc,driveFirmware=RL53.315-5.30 Rc:6440 move root hli0 17801
2013-08-23T17:35:43.493 error 1.3.-1.1.1 3955 "Error from device Code: 603 - Drive cartridge is present", Data=loaderror6440603000000 D10212T2(driveType=T10000c-Enc,driveFirmware=RL53.315-5.30 Rc:6440 move root hli0 18001



OKM KMA list report shows KMA's not responding:
------------------------------------
3 KMA's are not responding (see RespondingService column):

KMAID                         Name       ... FailedLogin.. KMAVersion Responding RespondingService RespTime  RepLagSize KeysReadyCnt KeysReadyInBackupCnt KeysGenCnt Locked Enrolled HSMStatus
-255598260467144508 DCkma01 ....    0              build1348   true            Responding                0                   0             1002              1002                       0                false    true        Hardware
-287668531459807356 DCkma02 ....    0              build1348   true            Not Responding      85                  0             1009              519                         0                false    true       Hardware
-4395022526292076502 GBkma01 ....   0             build1348    true           Not Responding      156                 0             1002              1002                       0                false    true       Hardware
4249295192349485770 GBkma02 ....    0             build1348    true           Not Responding      114                 0             1005              1005                       0                false    true       Hardware

The dladm list shows:
$ cat dladm.txt
igb0 type: non-vlan mtu: 1500 device: igb0
igb1 type: non-vlan mtu: 1500 device: igb1
igb2 type: non-vlan mtu: 1500 device: igb2
igb3 type: non-vlan mtu: 1500 device: igb3
aggr1 type: non-vlan mtu: 1500 aggregation: key 1
key: 1 (0x0001) policy: L4 address: 0:21:28:fb:67:16 (auto)
  device address speed duplex link state
  igb2 0:21:28:fb:67:16 0 Mbps half down standby <<<<< 
  igb3 0:21:28:fb:67:17 0 Mbps half down standby <<<<<<
  key: 1 (0x0001) policy: L4 address: 0:21:28:fb:67:16 (auto)
  LACP mode: off LACP timer: short
  device activity timeout aggregatable sync coll dist defaulted expired
  igb2 passive short yes no no no no no
  igb3 passive short yes no no no no no

Changes

No known change. 
Customer noted a couple of KMA's in the KMA list are highlighted in pink

Cause

Sign In with your My Oracle Support account

Don't have a My Oracle Support account? Click to get started

My Oracle Support provides customers with access to over a
Million Knowledge Articles and hundreds of Community platforms