Node1 Rebooted and Node2 Rebootless Evicted After Loss of Voting File on Node1 (Doc ID 1917926.1)

Last updated on MAY 01, 2016

Applies to:

Oracle Database - Enterprise Edition - Version 12.1.0.1 to 12.1.0.1 [Release 12.1]
Information in this document applies to any platform.

Symptoms

12.1.0.1 2-node Grid Infrastructure, node1 had IO error and rebooted as expected, however, node2 also went through rebootless eviction:


/var/log/messages - node1

Jul 18 06:02:23 racnode1 kernel: lpfc 0000:10:00.0: 1:(0):0748 abort handler timed out waiting for abortng I/O (xri:x0) to complete: ret 0x2003, ID 0, LUN 13
Jul 18 06:03:23 racnode1 kernel: lpfc 0000:10:00.0: 1:(0):0748 abort handler timed out waiting for abortng I/O (xri:x0) to complete: ret 0x2003, ID 0, LUN 13
Jul 18 06:07:28 racnode1 kernel: imklog 5.8.10, log source = /proc/kmsg started.


<GI_HOME>/log/<node>/alert<node>.log - node1

2014-07-18 06:01:51.267:
[cssd(6630)]CRS-1615:No I/O has completed after 50% of the maximum interval. Voting file /dev/oracleasm/asm1 will be considered not functional in 99870 milliseconds
2014-07-18 06:02:21.167:
[/u01/app/12.1.0/grid/bin/oraagent.bin(9087)]CRS-5011:Check of resource "payprod" failed: details at "(:CLSN00007:)" in "/u01/app/12.1.0/grid/log/racnode1/agent/crsd/oraagent_oracle/oraagent_oracle.log"
2014-07-18 06:02:41.274:
[cssd(6630)]CRS-1614:No I/O has completed after 75% of the maximum interval. Voting file /dev/oracleasm/asm1 will be considered not functional in 49870 milliseconds
2014-07-18 06:03:11.269:
[cssd(6630)]CRS-1613:No I/O has completed after 90% of the maximum interval. Voting file /dev/oracleasm/asm1 will be considered not functional in 19880 milliseconds
2014-07-18 06:03:31.269:
[cssd(6630)]CRS-1604:CSSD voting file is offline: /dev/oracleasm/asm1; details at (:CSSNM00058:) in /u01/app/12.1.0/grid/log/racnode1/cssd/ocssd.log.
2014-07-18 06:03:31.269:
[cssd(6630)]CRS-1606:The number of voting files available, 0, is less than the minimum number of voting files required, 1, resulting in CSSD termination to ensure data integrity; details at (:CSSNM00018:) in /u01/app/12.1.0/grid/log/racnode1/cssd/ocssd.log
2014-07-18 06:03:31.269:
[cssd(6630)]CRS-1656:The CSS daemon is terminating due to a fatal error; Details at (:CSSSC00012:) in /u01/app/12.1.0/grid/log/racnode1/cssd/ocssd.log
2014-07-18 06:07:49.995:
[ohasd(5248)]CRS-0714:Oracle Clusterware Release 12.1.0.1.0 - Production Copyright 1996, 2010 Oracle. All rights reserved.


<GI_HOME>/log/<node>/ocssd/ocssd.log - node1

2014-07-18 06:01:51.267: [ CSSD][3600758528]clssnmvDiskCheck: (/dev/oracleasm/asm1) No I/O completed after 50% maximum time, 200000 ms, will be considered unusable in 99870 ms
....
2014-07-18 06:02:34.179: [ CSSD][4230153984]clssscMonitorThreads clssnmvDiskPingThread_0 not scheduled for 142040 msecs
2014-07-18 06:02:34.179: [ CSSD][4230153984]clssscMonitorThreads clssnmvWorkerThread_0 not scheduled for 141760 msecs
....
2014-07-18 06:02:41.274: [ CSSD][3600758528]clssnmvDiskCheck: (/dev/oracleasm/asm1) No I/O completed after 75% maximum time, 200000 ms, will be considered unusable in 49870 ms
....
2014-07-18 06:03:11.269: [ CSSD][3600758528]clssnmvDiskCheck: (/dev/oracleasm/asm1) No I/O completed after 90% maximum time, 200000 ms, will be considered unusable in 19880 ms
....
2014-07-18 06:03:27.487: [ CSSD][4230153984]clssscMonitorThreads clssnmvDiskPingThread_0 not scheduled for 195340 msecs
2014-07-18 06:03:27.487: [ CSSD][4230153984]clssscMonitorThreads clssnmvWorkerThread_0 not scheduled for 195060 msecs
....
2014-07-18 06:03:31.269: [ CSSD][3600758528](:CSSNM00058:)clssnmvDiskCheck: No I/O completions for 200120 ms for voting file /dev/oracleasm/asm1)
2014-07-18 06:03:31.269: [ CSSD][3600758528]clssnmvDiskAvailabilityChange: voting file /dev/oracleasm/asm1 now offline
2014-07-18 06:03:31.269: [ CSSD][3600758528](:CSSNM00018:)clssnmvDiskCheck: Aborting, 0 of 1 configured voting disks available, need 1
2014-07-18 06:03:31.269: [ CSSD][3600758528]###################################
2014-07-18 06:03:31.269: [ CSSD][3600758528]clssscExit: CSSD aborting from thread clssnmvDiskPingMonitorThread
2014-07-18 06:03:31.269: [ CSSD][3600758528]###################################
2014-07-18 06:03:31.269: [ CSSD][3600758528](:CSSSC00012:)clssscExit: A fatal error occurred and the CSS daemon is terminating abnormally
2014-07-18 06:03:31.269: [ CSSD][3600758528]####### Begin Diagnostic Dump #######


<GI_HOME>/log/<node>/alert<node>.log - node2

2014-07-18 06:03:45.903:
[cssd(5938)]CRS-1612:Network communication with node racnode1 (1) missing for 50% of timeout interval. Removal of this node from cluster in 14.530 seconds
2014-07-18 06:03:53.905:
[cssd(5938)]CRS-1611:Network communication with node racnode1 (1) missing for 75% of timeout interval. Removal of this node from cluster in 6.530 seconds
2014-07-18 06:03:57.906:
[cssd(5938)]CRS-1610:Network communication with node racnode1 (1) missing for 90% of timeout interval. Removal of this node from cluster in 2.530 seconds
2014-07-18 06:04:02.239:
[cssd(5938)]CRS-1609:This node is unable to communicate with other nodes in the cluster and is going down to preserve cluster integrity; details at (:CSSNM00008:) in /u01/app/12.1.0/grid/log/racnode2/cssd/ocssd.log.
2014-07-18 06:04:02.239:
[cssd(5938)]CRS-1656:The CSS daemon is terminating due to a fatal error; Details at (:CSSSC00012:) in /u01/app/12.1.0/grid/log/racnode2/cssd/ocssd.log
2014-07-18 06:04:02.282:
[cssd(5938)]CRS-1652:Starting clean up of CRSD resources.
2014-07-18 06:04:03.546:
[/u01/app/12.1.0/grid/bin/oraagent.bin(8949)]CRS-5016:Process "/u01/app/12.1.0/grid/opmn/bin/onsctli" spawned by agent "/u01/app/12.1.0/grid/bin/oraagent.bin" for action "check" failed: details at "(:CLSN00010:)" in "/u01/app/12.1.0/grid/log/racnode2/agent/crsd/oraagent_oracle/oraagent_oracle.log"
2014-07-18 06:04:04.147:
[/u01/app/12.1.0/grid/bin/oraagent.bin(8949)]CRS-5016:Process "/u01/app/12.1.0/grid/bin/lsnrctl" spawned by agent "/u01/app/12.1.0/grid/bin/oraagent.bin" for action "check" failed: details at "(:CLSN00010:)" in "/u01/app/12.1.0/grid/log/racnode2/agent/crsd/oraagent_oracle/oraagent_oracle.log"
2014-07-18 06:04:04.153:
[/u01/app/12.1.0/grid/bin/oraagent.bin(8949)]CRS-5016:Process "/u01/app/12.1.0/grid/bin/lsnrctl" spawned by agent "/u01/app/12.1.0/grid/bin/oraagent.bin" for action "check" failed: details at "(:CLSN00010:)" in "/u01/app/12.1.0/grid/log/racnode2/agent/crsd/oraagent_oracle/oraagent_oracle.log"
2014-07-18 06:04:04.368:
[cssd(5938)]CRS-1654:Clean up of CRSD resources finished successfully.
2014-07-18 06:04:04.369:
[cssd(5938)]CRS-1655:CSSD on node racnode2 detected a problem and started to shutdown.
2014-07-18 06:04:04.394:
[/u01/app/12.1.0/grid/bin/orarootagent.bin(8953)]CRS-5822:Agent '/u01/app/12.1.0/grid/bin/orarootagent_root' disconnected from server. Details at (:CRSAGF00117:) {0:3:25699} in /u01/app/12.1.0/grid/log/racnode2/agent/crsd/orarootagent_root/orarootagent_root.log.


<GI_HOME>/log/<node>/ocssd/ocssd.log - node2

2014-07-18 06:03:45.903: [ CSSD][2561263360]clssnmPollingThread: node racnode1 (1) at 50% heartbeat fatal, removal in 14.530 seconds
2014-07-18 06:03:45.904: [ CSSD][2561263360]clssnmPollingThread: node racnode1 (1) is impending reconfig, flag 2491406, misstime 15470
2014-07-18 06:03:45.904: [ CSSD][2561263360]clssnmPollingThread: local diskTimeout set to 27000 ms, remote disk timeout set to 27000, impending reconfig status(1)
2014-07-18 06:03:45.904: [ CSSD][2564417280]clssnmvDHBValidateNCopy: node 1, racnode1, has a disk HB, but no network HB, DHB has rcfg 295441781, wrtcnt, 9109148, LATS 1390906938, lastSeqNo 8764042, uniqueness 1400000963, timestamp 1405688412/1390750188
2014-07-18 06:03:45.924: [ CSSD][2567579392]clssnmvDiskPing: Writing with status 0x3, timestamp 1405688625/1390906958
....
2014-07-18 06:04:00.437: [ CSSD][2561263360]clssnmPollingThread: Removal started for node racnode1 (1), flags 0x26040e, state 3, wt4c 0
2014-07-18 06:04:00.437: [ CSSD][2561263360]clssnmMarkNodeForRemoval: node 1, racnode1 marked for removal
2014-07-18 06:04:00.437: [ CSSD][2561263360]clssnmDiscHelper: racnode1, node(1) connection failed, endp (0x5f8), probe((nil)), ninf->endp 0x1000005f8
2014-07-18 06:04:00.437: [ CSSD][2561263360]clssnmDiscHelper: node 1 clean up, endp (0x5f8), init state 5, cur state 5
....
2014-07-18 06:04:02.239: [ CSSD][2558093056]clssnmCheckDskInfo: Checking disk info...
2014-07-18 06:04:02.239: [ CSSD][2558093056]clssnmCheckSplit: Node 1, racnode1, is alive, DHB (1405688412, 1390750188) more than disk timeout of 27000 after the last NHB (1405688610, 1390948468)
2014-07-18 06:04:02.239: [ CSSD][2558093056]clssnmCheckDskInfo: My cohort: 2
2014-07-18 06:04:02.239: [ CSSD][2558093056]clssnmCheckDskInfo: Surviving cohort: 1
2014-07-18 06:04:02.239: [ CSSD][2558093056](:CSSNM00008:)clssnmCheckDskInfo: Aborting local node to avoid splitbrain. Cohort of 1 nodes with leader 2, racnode2, is smaller than cohort of 1 nodes led by node 1, racnode1, based on map type 2
2014-07-18 06:04:02.239: [ CSSD][2558093056]###################################
2014-07-18 06:04:02.239: [ CSSD][2558093056]clssscExit: CSSD aborting from thread clssnmRcfgMgrThread
2014-07-18 06:04:02.239: [ CSSD][2558093056]###################################
2014-07-18 06:04:02.239: [ CSSD][2558093056](:CSSSC00012:)clssscExit: A fatal error occurred and the CSS daemon is terminating abnormally
2014-07-18 06:04:02.239: [ CSSD][2558093056]####### Begin Diagnostic Dump #######

 

Cause

Sign In with your My Oracle Support account

Don't have a My Oracle Support account? Click to get started

My Oracle Support provides customers with access to over a
Million Knowledge Articles and hundreds of Community platforms