ACSLS - ACSLM connection refused or STATUS_IPC_FAILURE error causing ACSLS to hang (Doc ID 1617828.1)

Last updated on JANUARY 28, 2014

Applies to:

Sun StorageTek Auto Cartridge Sys Lib SW (ACSLS) - Version 7.1 to 8.3 [Release 7.0 to 8.0]
Information in this document applies to any platform.

Symptoms

The logical library becomes unavailable at certain times during the week.

At a certain site, these ACSLS 8.2 socket errors get logged in the acsss_event.log on Monday nights:
----------------------------------------------------
2013-09-23 20:21:48 ACSLM[0]:
123 N cl_ipc_write.c 1 145
cl_ipc_write: Sending message to socket *82e6a8b46e46 failed on "Connection refused"

2013-09-23 20:21:48 ACSSA[0]:
1430 N sa_demux.c 1 296
IPC failure on socket *82e6a8b46e46.

2013-09-23 20:22:50 ACSLM[0]:
123 N cl_ipc_write.c 1 145
cl_ipc_write: Sending message to socket *82e6a8b46e46 failed on "Connection refused"


2013-09-30 19:12:04 ACSLM[0]:
123 N cl_ipc_write.c 1 145
cl_ipc_write: Sending message to socket *82e9a8b46e46 failed on "Connection refused"

2013-09-30 19:12:04 ACSSA[0]:
1430 N sa_demux.c 1 296
IPC failure on socket *82e9a8b46e46.

 
The SMCE trace shows mount errors similar to these:
-------------------------------------------------------------------
2013-12-17 07:54:32.983 com.sun.slim.smce.control.SMCEListener (Thread-538658) - FINER: Received NEW_COMMAND,id=14e8e1,cdb=a5000000043001f70000000000000000,requestedXferLen=0,cmdXferLen=0,xferResid=0,xferGiven=0,dataDir=NONE,scsiStatus=GOOD,deviceId=1001,I=wwn.2101001B322DF4D9,T=wwn.21000024FF44C9AA,L=0
2013-12-17 07:54:32.987 com.sun.slim.smce.command.SMCECommand (Thread-538658) - FINER: SMCEMoveMediumCommand(14e8e1) allocated direct data buffer, size=0,deviceId=1,001,I=wwn.2101001B322DF4D9,T=wwn.21000024FF44C9AA,L=0
2013-12-17 07:54:32.987 com.sun.slim.smce.command.SMCEMoveMediumCommand executeCommand (Thread-538658) - FINER: MoveMediumCommand(14e8e1) source=1,072, destination=503, moveOption=NORMAL, deviceId=1,001, itl=I=wwn.2101001B322DF4D9,T=wwn.21000024FF44C9AA,L=0
2013-12-17 07:54:32.987 com.sun.slim.model.impl.LogicalAcsImpl move (Thread-538658) - FINER: ENTRY 1,072 503
2013-12-17 07:54:32.992 com.sun.slim.smce.command.SMCECommand (Thread-538658) - FINER: SMCEMoveMediumCommand(14e8e1) Completed with error, status=CHECK_CONDITION(2),key=ILLEGAL_REQUEST(5),asc/ascq=INVALID_ELEMENT_ADDRESS(2101),senseData=70000500 0000000c 00000000 210100c0 00040000,exception=ScsiInvalidElementException,reason=Invalid Element, fp=4,deviceId=1,001,I=wwn.2101001B322DF4D9,T=wwn.21000024FF44C9AA,L=0
2013-12-17 07:54:32.992 com.sun.slim.smce.control.SMCEListener (Thread-538658) - FINER: Returned SEND_SCSI_STATUS,id=14e8e1,requestedXferLen=0,cmdXferLen=0,xferResid=0,xferGiven=0,dataDir=NONE,scsiStatus=CHECK_CONDITION,scsiSense=

TSM FC-client shows mount errors like these:
-----------------------------------------------------
12/16/2013 19:07:06 ANR8943E A hardware or media error occurred during an
  operation on library SLCLIBP2 (OP=C0106C03, CC=209,
  KEY=04, ASC=44, ASCQ=00, SENSE=70.00.04.00.00.00.00.0C.0-
  0.00.00.00.44.00.00.00.00.00.00.00., Description=Device
  microcode failure detected). Refer to the Tivoli Storage
  Manager documentation on I/O error code description.
  (SESSION: 5109)
12/16/2013 19:07:06 ANR8381E ECARTRIDGE volume CS0373 could not be mounted in
  drive DRVC2 (/dev/tsmscsi/mt6). (SESSION: 5109)
12/16/2013 19:07:06 ANR9790W Request to mount volume CS0373 for library client
  ITSM06SP failed. (SESSION: 5109)
12/16/2013 19:07:06 ANR0409I Session 5109 ended for server ITSM06SP
  (Linux/x86_64). (SESSION: 5109)
12/16/2013 19:07:06 ANR0408I Session 5110 started for server ITSM06SP
  (Linux/x86_64) (Tcp/Ip) for library sharing. (SESSION:
  5110)


 



At other sites that are running earlier versions of ACSLS, ACSLS has to be restarted each week. The failure occurs every 7 days - give
or take a day. The following acsss_event.log messages indicate that the problem occurred:
 
2010-02-10 20:51:58 ACSLM[0]:
487 N cl_ipc_read.c 1 289
cl_ipc_read: invalid byte_count detected
 
2010-02-10 20:51:58 ACSLM[0]:
1 E lm_input.c 1 202
lm_input: cl_ipc_read() unexpected status = STATUS_IPC_FAILURE
 
2010-02-10 20:51:58 ACSLM[0]:
890 N lm_main.c 2 300
lm_main: Severe Error (STATUS_IPC_FAILURE), Exiting to ACSSS
 
2010-02-10 20:51:58 storage server[0]:
354 N ss_main.c 4 871
ss_main: exit status (54), STATUS_IPC_FAILURE, received from acslm
 
2010-02-10 20:51:58 ACSLM[0]:
883 N lm_init.c 3 230
lm_init: ACSLM has been restarted, acslm state is STATE_RUN
....
....
ACSLM would restart a couple of times but eventually would get these
messages:
---------------------------------------------------------
2010-02-10 20:53:06 ACSLM[0]:
883 N lm_init.c 3 230
lm_init: ACSLM has been restarted, acslm state is STATE_RUN
 
2010-02-10 20:53:07 storage server[0]:
361 N ss_main.c 4 1019
ss_main: acslm restarted, pid 26672
 
2010-02-10 20:53:11 ACSLM[0]:
886 N lm_input.c 1 210
lm_input: byte count(0) too small for min packet size(0) ignored
 
2010-02-10 20:53:16 ACSLM[0]:
886 N lm_input.c 1 210
lm_input: byte count(0) too small for min packet size(0) ignored
 
2010-02-10 20:53:16 ACSLM[0]:
24 N lm_resp_proc.c 1 223
lm_resp_proc: Unable to access member 9
 
2010-02-10 20:53:16 ACSLM[0]:
1 E lm_main.c 2 347
lm_main: lm_resp_proc() unexpected status = STATUS_PROCESS_FAILURE
 
2010-02-10 20:53:16 ACSLM[0]:
24 N lm_resp_proc.c 1 223
lm_resp_proc: Unable to access member 605
 

CSI and cmd_proc requests are not completing. ACSLM is hung. ACSLS has to be restarted to recover from the error.

Changes

The ACSLS server admin is not aware of any changes done on the server

Cause

Sign In with your My Oracle Support account

Don't have a My Oracle Support account? Click to get started

My Oracle Support provides customers with access to over a
Million Knowledge Articles and hundreds of Community platforms