My Oracle Support Banner

Random Node Reboot after Starting Clusterware on AIX platform (Doc ID 2004728.1)

Last updated on AUGUST 04, 2018

Applies to:

Oracle Database - Enterprise Edition - Version 11.2.0.3 and later
IBM AIX on POWER Systems (64-bit)

Symptoms

Two node RAC 11.2.0.4 on AIX 7.1, node reboots randomly. Sometime the node is evicted following a member kill escalated to node kill:

alert_+ASM2.log shows:

2015-04-01 10:20:13 IPC Send timeout detected. Sender: ospid 17760466 [oracle@racnodeb (PING)]
2015-04-01 10:20:13 Receiver: inst 1 binc 1257147021 ospid 7143658
2015-04-01 10:20:35 IPC Send timeout detected. Sender: ospid 16908308 [oracle@racnodeb (LMON)]
2015-04-01 10:20:35 Receiver: inst 1 binc 1257147068 ospid 6357358
2015-04-01 10:20:35 Communications reconfiguration: instance_number 1
2015-04-01 10:22:33 Evicting instance 1 from cluster
2015-04-01 10:22:33 Waiting for instances to leave: 1
2015-04-01 10:22:52 Remote instance kill is issued with system inc 42
2015-04-01 10:22:52 Remote instance kill map (size 1) : 1
2015-04-01 10:22:52 LMON received an instance eviction notification from instance 2
2015-04-01 10:22:52 The instance eviction reason is 0x20000000
2015-04-01 10:22:52 The instance eviction map is 1
2015-04-01 10:23:02 Waiting for instances to leave: 1
2015-04-01 10:23:32 Waiting for instances to leave: 1
2015-04-01 10:23:56 Reconfiguration started (old inc 40, new inc 44)
2015-04-01 10:23:56 List of instances:  2 (myinst: 2)

ocssd.log on node 2 shows:

2015-04-01 10:22:52.942: [    CSSD][1029]clssgmExecuteClientRequest: Member kill request from client (111d10230)
2015-04-01 10:22:52.942: [    CSSD][1029](:CSSGM00044:)clssgmReqMemberKill: Kill requested map 0x00000001 flags 0x2 escalate 0xffffffff
2015-04-01 10:22:52.944: [    CSSD][6204]clssgmMbrKillThread: Kill requested map 0x00000001 id 7 Group name DB+ASM flags 0x00000001 start time 0x51494390 end time 0x5149bab4 time out 30500 req node 2
...
2015-04-01 10:23:23.446: [    CSSD][6204]clssgmMbrKillThread: Time up: Start time 1363755920 End time 1363786420 Current time 1363786422 timeout 30500
2015-04-01 10:23:23.446: [    CSSD][6204]clssgmMbrKillThread: Member kill request complete.
2015-04-01 10:23:23.446: [    CSSD][6204]clssgmMbrKillSendEvent: Missing answers or immediate escalation: Req member 1 Req node 2 Number of answers expected 0 Number of answers outstanding 1
2015-04-01 10:23:23.446: [    CSSD][6204]clssgmQueueGrockEvent: groupName(DB+ASM) count(2) master(1) event(11), incarn 0, mbrc 0, to member 1, events 0x68, state 0x0
2015-04-01 10:23:23.446: [    CSSD][6204]clssgmMbrKillEsc: Escalating node 1 Member request 0x00000001 Member success 0x00000000 Member failure 0x00000000 Number left to kill 1
2015-04-01 10:23:23.446: [    CSSD][6204]clssnmMarkNodeForRemoval: node 1, racnodea marked for removal
2015-04-01 10:23:23.446: [    CSSD][6204]clssnmKillNode: node 1 (racnodea) kill initiated
2015-04-01 10:23:23.446: [    CSSD][6204]clssgmMbrKillThread: Exiting
...
2015-04-01 10:23:55.483: [    CSSD][6205]clssgmCMReconfig: reconfiguration successful, incarnation 321649593 with 1 nodes, local node number 2, master node number 2

But node 1 ocssd.log does not show anything prior to the node reboot:

2015-04-01 10:14:09.280: [    CSSD][1029]clssgmClientConnectMsg: Connect from con(76ae) proc(11398c670) pid(5112104) version 11:2:1:4, properties: 1,2,3,4
2015-04-01 10:14:09.280: [    CSSD][1029]clssgmClientConnectMsg: msg flags 0x0000
2015-04-01 10:14:12.627: [    CSSD][5414]clssnmSendingThread: sending status msg to all nodes
2015-04-01 10:14:12.627: [    CSSD][5414]clssnmSendingThread: sent 4 status msgs to all nodes
<< no message after this time until restart

Sometimes the node is rebooted right after CRS starts up. AIX crash dump is captured for this case.

Cause

To view full details, sign in with your My Oracle Support account.

Don't have a My Oracle Support account? Click to get started!


In this Document
Symptoms
Cause
Solution


My Oracle Support provides customers with access to over a million knowledge articles and a vibrant support community of peers and Oracle experts.