ACSLS HA - ACSLS Services Not Coming Online
(Doc ID 1918232.1)
Last updated on SEPTEMBER 09, 2021
Applies to:
Sun StorageTek Auto Cartridge Sys Lib SW (ACSLS) - Version 8.3 to 8.3 [Release 8.0]Sun StorageTek Auto Cartridge Sys Lib SW (ACSLS) High Availability - Version 8.3 to 8.3 [Release 8.0]
Oracle Solaris on SPARC (64-bit)
Symptoms
Not all ACSLS services are coming online. The acsdb service remains in maintenance mode as shown below:
$ acsss status
Copyright 1989, 2013 Oracle and/or its affiliates. All Rights Reserved.
acsdb: maintenance
acsls: offline
Postgres dumped core:
---
$ pstack postgres.XXXXXX.4000001.XXXX.hostname.core
core 'postgres.XXXXXX.4000001.XXXX.hostname.core' of 4865: /usr/postgres/8.3/bin/postgres -D /export/home/acsdb/ACSDB1.0/data
fee3ebd4 _lwp_kill (6, 0, 0, fee1e0f0, ffffffff, 6) + 8
fedb29f0 abort (0, 1, 3500a0, ffb04, feeb5518, 0) + 110
0026a484 errfinish (2, 36b800, 36b800, 36bb9c, 0, 368800) + 26c
00079e04 ???????? (ffbff268, 0, 1, 1, 1, 351198)
0007a478 XLogFlush (213209b0, fdc09af8, 0, fdc09af0, 0, 36bbe8) + 278
000748b8 RecordTransactionCommit (1, 0, 351188, 1d6fe, 0, 1) + 268
00075148 CommitTransaction (1, 36b800, 3510ec, 0, 2, 2df000) + c8
0007597c CommitTransactionCommand (75800, ac, 4, 0, 3510ec, 758d0) + 68
001bc930 ???????? (3d7c28, 2c0, 315400, 315400, 368800, 1)
001bab14 ???????? (418ac8, 4196e8, 2, 482d70, 478870, 4196c0)
001be63c PostgresMain (51, 368800, 0, 51, 3af400, 1) + 1158
00191280 ???????? (3d3a78, 371f4c, c81, 30fbd8, 3c2858, 4)
001909d4 ???????? (3d3a78, ffbff858, 0, 0, 0, 3bb128)
0018e838 ???????? (382918, 6, 5390ff21, 382918, 10, 3d3a78)
0018e374 PostmasterMain (382400, 368000, 3af400, 18f400, 18f400, 18f400) + d24
00140f04 main (3, ffbffa94, 3bad50, 306800, 306800, 306800) + 210
00047990 _start (0, 0, 0, 0, 0, 0) + 108
Postgre/SQL dumped core while ACSLS was trying to commit an update to the lsmtable:
---
2014-07-09 18:47:01 EDT acsdb 4865 0 LOG:
statement: update lsmtable set lsm_activity = lsm_activity + 1
where acs = 0 and lsm = 3
2014-07-09 18:47:01 EDT acsdb 4865 120574 PANIC:
could not write to log file 0,
segment 33 at offset 3268608, length 16384: Checksum failure
2014-07-09 18:47:01 EDT acsdb 4865 120574 STATEMENT:
update lsmtable set lsm_activity = lsm_activity + 1
where acs = 0 and lsm = 3
Message in acsss_event.log when Postgre/SQL backend crashed:
---
2014-07-09 18:47:05 MOUNT[0]:
1328 N di_pri_get_status_code.c Unknown 212
di_pri_get_status_code: DBMS error.
Return code (08S01) and message ("[unixODBC]No response from the backend;
No response from the backend"). DI_STATUS = DI_S_FAILURE
Message in pg_log transaction file when PostgreSQL terminated:
---
2014-07-09 18:47:05 EDT 2752 0 LOG:
server process (PID 4865) was terminated by signal 6
2014-07-09 18:47:05 EDT 2752 0 LOG:
terminating any other active server processes
The pg_log transaction file is showing these messages a day before the DB backend crashed:
---
postgresql-2014-07-08_213318.log:2014-07-08 21:33:19 EDT 2756 0 ERROR: could not write block 16 of relation 1663/16384/17046: Checksum failure
postgresql-2014-07-08_213318.log:2014-07-08 21:33:20 EDT 2756 0 ERROR: could not write block 16 of relation 1663/16384/17046: Checksum failure
postgresql-2014-07-08_213318.log:2014-07-08 21:33:21 EDT 2756 0 ERROR: could not write block 16 of relation 1663/16384/17046: Checksum failure
postgresql-2014-07-08_213318.log:2014-07-08 21:33:22 EDT 2756 0 ERROR: could not write block 16 of relation 1663/16384/17046: Checksum failure
postgresql-2014-07-08_213318.log:2014-07-08 21:33:23 EDT 2756 0 ERROR: could not write block 16 of relation 1663/16384/17046: Checksum failure
postgresql-2014-07-08_213318.log:2014-07-08 21:33:24 EDT 2756 0 ERROR: could not write block 16 of relation 1663/16384/17046: Checksum failure
postgresql-2014-07-08_213318.log:2014-07-08 21:33:25 EDT 2756 0 ERROR: could not write block 16 of relation 1663/16384/17046: Checksum failure
...
From the system log, backend crashed on on Jul 9:
-----
Jul 9 18:47:03 hostname genunix: [ID 603404 kern.notice] NOTICE: core_log: postgres[4865] core dumped: /var/corefiles/postgres.XXXXXX.4000001.XXXX.hostname.core
Jul 9 18:47:36 hostname svc.startd[2126]: [ID 122153 daemon.warning] svc:/application/slm/acsdb:default: Method or service exit timed out. Killing contract 269994.
Jul 9 18:47:36 hostname svc.startd[2126]: [ID 748625 daemon.error] application/slm/acsdb:default failed: transitioned to maintenance (see 'svcs -xv' for details)
Jul 9 18:47:36 hostname svc.startd[2126]: [ID 748625 daemon.error] application/slm/acsdb:default failed: transitioned to maintenance (see 'svcs -xv' for details)
Changes
Upgrade to ACSLS - HA to version 8.3
Cause
To view full details, sign in with your My Oracle Support account. |
|
Don't have a My Oracle Support account? Click to get started! |
In this Document
Symptoms |
Changes |
Cause |
Solution |
References |