SuperCluster : RAC : CRS not able to rejoin the cluster following node eviction or reboot due to CSSD. (Doc ID 2166436.1)

Last updated on JULY 28, 2016

Applies to:

Oracle SuperCluster M6-32 Hardware - Version All Versions and later
Solaris SPARC Operating System - Version 11.1 to 11.2 [Release 11.0]
Oracle SuperCluster T5-8 Full Rack - Version All Versions and later
SPARC SuperCluster T4-4 - Version All Versions and later
Oracle SuperCluster M7 Hardware - Version All Versions and later
Oracle Solaris on SPARC (64-bit)
Grid Infrastructure on Oracle SuperCluster. Evictions. CRS Startup. CSSD. GIPCD.

Symptoms

RAC Node CRS and CCSD will not restart following a node eviction.

OCSSD log: will have entries similar to


[ CSSD][28]clssnmvDHBValidateNcopy: node 1, nodename , has a disk HB, but no network HB, DHB has rcfg 329583179, wrtcnt, 50671015, LATS 2293246942, lastSeqNo 50671012, uniqueness 1434089571, timestamp 1446423142/3810007342

GIPCD log: will have entries similar to

[GIPCDMON][7] gipcdMonitorCssCheck: Failure querying CSS NodeList ret 3
[GIPCDMON][7] gipcdMonitorFailZombieNodes: Forcing zombie failure, node nodename, now 0, last 2267807392,
[GIPCDNDE][6] gipcdNodeDisconnect: Deleting information for remote con host(nodename), id (0000000000000000,0000000000000912)
[GIPCDNDE][6] gipcdNodeDisconnect: Deleting information of all clients on remote endps (0000000000000000,0000000000000912), (0000000000000000, 0000000000000a0d)
[GIPCDCLT][5] gipcdDeleteInterfaces: No interface object exist in the map for haname(1a02-b661-3eaa-f863)

will reflect that cssd has not started

pstack of cssd process , before it times out, will look similar to

pstack main stack being looped through

------------ lwp# 36 / thread# 36 ---------------
ffffffff7ed473a4 lwp_park (0, 0, 0)
ffffffff7ed40ba4 cond_wait_queue (100398050, 100c7edb0, 0, 0,
ffffffff7ee8cfc0, 0) + 4c
ffffffff7ed41198 cond_wait (100398050, 100c7edb0, deadbeef, 1092bc, 1, 1) +
10
ffffffff57967708 sltspcwait (100399f10, 100cd3dd8, 100cd3dc0, 1, 0, 0) + 8
ffffffff59c1e260 clsucvwait (100399f10, 100cd3dd8, 100cd3dc0, 0, 0,
ffffffff7ee86000) + 28
000000010009f710 clssgmProcDeadClntq (10191bd10, 100cd3d88, 1,
ffffffff7e509a40, 0, fffc00) + 1e8
000000010009f4a8 clssgmDeathChkThread (10191bd10, ff000000,
ffffffff7ee8cfc0, 1, 0, fffc00) + 190
00000001000117f8 clssscthrdmain (10191bd10, 0, 0, 100011718, 0, 1) + e0
ffffffff7ed47328 _lwp_start (0, 0, 0, 0, 0, 0)
------------ lwp# 37 / thread# 37 ---------------
ffffffff7ed4b794 portfs (5, 60, 103aea010, 0, 0, 0)
ffffffff59ebec20 sgipcwEpollWaitHelper (0, 1003a0610, 0, 0, 0, 0) + 2b8
ffffffff59eb8e4c sgipcwWait (0, 1000, 0, 1003a0610, 1000, 0) + 3fc
ffffffff59d62f7c gipcWaitOsd (0, 1000, 1003a0610, 0, 0, ffffffff79fe0c4c) +
18c
ffffffff59d4c5a0 gipcInternalWaitEpoll (ffffffff79fe1298, 10191b450,
ffffffff79fe1860, 200, ffffffff79fe185c, ffffffff) + 1278
ffffffff59d46420 gipcInternalWait (ffffffff, 10191b450, 100105533,
1000ddb78, bbe, ffffffff) + 1c40
ffffffff59ce60b8 gipcWaitF (ffffffff, 1000ddb78, 100105533, 1000ddb78, bbe,
200) + b58
00000001000150d4 clssscSelect (10191b590, 100cd3a20, 977, 100060d88, 0,
fffc00) + 9c
000000010005aa6c clssgmPeerListener (100cd3a20, ff000000, ffffffff7ee8cfc0,
1, 0, fffc00) + 1d34
00000001000117f8 clssscthrdmain (10191b590, 0, 0, 100011718, 0, 1) + e0
ffffffff7ed47328 _lwp_start (0, 0, 0, 0, 0, 0)
------------ lwp# 38 / thread# 38 ---------------
ffffffff7ed473a4 lwp_park (0, ffffffff79dfb540, 0)
ffffffff7ed40ba4 cond_wait_queue (100f73510, 10118bc80, ffffffff79dfb540, 0,
0, 0) + 4c
ffffffff7ed41080 cond_wait_common (100f73510, 10118bc80, ffffffff79dfb540,
2946, 0, 0) + 28c
ffffffff7ed41250 __cond_timedwait (100f73510, 10118bc80, ffffffff79dfb6b8,
0, ffffffff79dfb540, 0) + 60
ffffffff7ed41314 cond_timedwait (100f73510, 10118bc80, ffffffff79dfb6b8, 0,
0, 1d12880) + 14
ffffffff57967814 sltspctimewait (20, 101188998, 101188980, 1, 0, 1) + f4
ffffffff59c1e2ec clsucvtimewait (100399f10, 101188998, 101188980, 3e8,
10000000, 0) + 34
00000001000ab000 clssnmWaitThread (10191a910, 101188010, 2, 3e8, 0, 2) + 2f8
00000001000a7d98 clssnmPollingThread (1000faaa8, 10118cb4c, 2ed, 2ed, 0,
fffc00) + 770
00000001000117f8 clssscthrdmain (10191a910, 0, 0, 100011718, 0, 1) + e0
ffffffff7ed47328 _lwp_start (0, 0, 0, 0, 0, 0)

 

Changes

None

Cause

Sign In with your My Oracle Support account

Don't have a My Oracle Support account? Click to get started

My Oracle Support provides customers with access to over a
Million Knowledge Articles and hundreds of Community platforms