ACSLS HA - ACSLS Services Not Coming Online (Doc ID 1918232.1)

Last updated on AUGUST 20, 2014

Applies to:

Sun StorageTek Auto Cartridge Sys Lib SW (ACSLS) - Version 8.3 to 8.3 [Release 8.0]
Sun StorageTek Auto Cartridge Sys Lib SW (ACSLS) High Availability - Version 8.3 to 8.3 [Release 8.0]
Oracle Solaris on SPARC (64-bit)

Symptoms

Not all ACSLS services are coming online.  The acsdb service remains in maintenance mode as shown below:

$  acsss status
     Copyright 1989, 2013 Oracle and/or its affiliates.  All Rights Reserved.

     acsdb: maintenance
     acsls: offline

Postgres dumped core:
---
 $ pstack postgres.116574.4000001.4865.pvsna607.core
core 'postgres.116574.4000001.4865.pvsna607.core' of 4865:      /usr/postgres/8.3/bin/postgres -D /export/home/acsdb/ACSDB1.0/data
 fee3ebd4 _lwp_kill (6, 0, 0, fee1e0f0, ffffffff, 6) + 8
 fedb29f0 abort    (0, 1, 3500a0, ffb04, feeb5518, 0) + 110
 0026a484 errfinish (2, 36b800, 36b800, 36bb9c, 0, 368800) + 26c
 00079e04 ???????? (ffbff268, 0, 1, 1, 1, 351198)
 0007a478 XLogFlush (213209b0, fdc09af8, 0, fdc09af0, 0, 36bbe8) + 278
 000748b8 RecordTransactionCommit (1, 0, 351188, 1d6fe, 0, 1) + 268
 00075148 CommitTransaction (1, 36b800, 3510ec, 0, 2, 2df000) + c8
 0007597c CommitTransactionCommand (75800, ac, 4, 0, 3510ec, 758d0) + 68
 001bc930 ???????? (3d7c28, 2c0, 315400, 315400, 368800, 1)
 001bab14 ???????? (418ac8, 4196e8, 2, 482d70, 478870, 4196c0)
 001be63c PostgresMain (51, 368800, 0, 51, 3af400, 1) + 1158
 00191280 ???????? (3d3a78, 371f4c, c81, 30fbd8, 3c2858, 4)
 001909d4 ???????? (3d3a78, ffbff858, 0, 0, 0, 3bb128)
 0018e838 ???????? (382918, 6, 5390ff21, 382918, 10, 3d3a78)
 0018e374 PostmasterMain (382400, 368000, 3af400, 18f400, 18f400, 18f400) + d24
 00140f04 main     (3, ffbffa94, 3bad50, 306800, 306800, 306800) + 210
 00047990 _start   (0, 0, 0, 0, 0, 0) + 108

Postgre/SQL dumped core while ACSLS was trying to commit an update to the lsmtable:
---
  2014-07-09 18:47:01 EDT acsdb 4865 0 LOG:  
      statement: update lsmtable set lsm_activity = lsm_activity + 1
                 where acs = 0 and lsm = 3
  2014-07-09 18:47:01 EDT acsdb 4865 120574 PANIC:  
      could not write to log file 0,
      segment 33 at offset 3268608, length 16384: Checksum failure
  2014-07-09 18:47:01 EDT acsdb 4865 120574 STATEMENT:  
      update lsmtable set lsm_activity = lsm_activity + 1
      where acs = 0 and lsm = 3


Message in acsss_event.log when Postgre/SQL backend crashed:
---
  2014-07-09 18:47:05 MOUNT[0]:
  1328 N di_pri_get_status_code.c Unknown 212
  di_pri_get_status_code: DBMS error.  
    Return code (08S01) and message ("[unixODBC]No response from the backend;
    No response from the backend"). DI_STATUS = DI_S_FAILURE


Message in pg_log transaction file when PostgreSQL terminated:
---
  2014-07-09 18:47:05 EDT  2752 0 LOG:  
      server process (PID 4865) was terminated by signal 6
  2014-07-09 18:47:05 EDT  2752 0 LOG:  
      terminating any other active server processes


The pg_log transaction  file is showing these messages a day before the DB backend crashed:
---
postgresql-2014-07-08_213318.log:2014-07-08 21:33:19 EDT  2756 0 ERROR:  could not write block 16 of relation 1663/16384/17046: Checksum failure
postgresql-2014-07-08_213318.log:2014-07-08 21:33:20 EDT  2756 0 ERROR:  could not write block 16 of relation 1663/16384/17046: Checksum failure
postgresql-2014-07-08_213318.log:2014-07-08 21:33:21 EDT  2756 0 ERROR:  could not write block 16 of relation 1663/16384/17046: Checksum failure
postgresql-2014-07-08_213318.log:2014-07-08 21:33:22 EDT  2756 0 ERROR:  could not write block 16 of relation 1663/16384/17046: Checksum failure
postgresql-2014-07-08_213318.log:2014-07-08 21:33:23 EDT  2756 0 ERROR:  could not write block 16 of relation 1663/16384/17046: Checksum failure
postgresql-2014-07-08_213318.log:2014-07-08 21:33:24 EDT  2756 0 ERROR:  could not write block 16 of relation 1663/16384/17046: Checksum failure
postgresql-2014-07-08_213318.log:2014-07-08 21:33:25 EDT  2756 0 ERROR:  could not write block 16 of relation 1663/16384/17046: Checksum failure

...

 

From the system log, backend crashed on on Jul 9:
-----
Jul  9 18:47:03 pvsna607 genunix: [ID 603404 kern.notice] NOTICE: core_log: postgres[4865] core dumped: /var/corefiles/postgres.116574.4000001.4865.pvsna607.core
Jul  9 18:47:36 pvsna607 svc.startd[2126]: [ID 122153 daemon.warning] svc:/application/slm/acsdb:default: Method or service exit timed out.  Killing contract 269994.
Jul  9 18:47:36 pvsna607 svc.startd[2126]: [ID 748625 daemon.error] application/slm/acsdb:default failed: transitioned to maintenance (see 'svcs -xv' for details)
Jul  9 18:47:36 pvsna607 svc.startd[2126]: [ID 748625 daemon.error] application/slm/acsdb:default failed: transitioned to maintenance (see 'svcs -xv' for details)

Changes

Upgrade to ACSLS - HA to version 8.3

Cause

Sign In with your My Oracle Support account

Don't have a My Oracle Support account? Click to get started

My Oracle Support provides customers with access to over a
Million Knowledge Articles and hundreds of Community platforms