Storage Split Caused Crash of ASM Diskgroup With Normal Redundancy
(Doc ID 1623400.1)
Last updated on AUGUST 15, 2023
Applies to:
Oracle Database - Enterprise Edition - Version 11.2.0.3 to 12.1.0.1 [Release 11.2 to 12.1]Oracle Database Cloud Schema Service - Version N/A and later
Gen 1 Exadata Cloud at Customer (Oracle Exadata Database Cloud Machine) - Version N/A and later
Oracle Database Exadata Express Cloud Service - Version N/A and later
Oracle Cloud Infrastructure - Database Service - Version N/A and later
Information in this document applies to any platform.
Symptoms
2-Node, 11.2.0.4 RAC with ASM. Normal redundancy diskgroups. When the storage link is broken, so that Node-A can not access Storage-B and Node-b can not access Storage-A, "both" the instances are affected.
We see diskgroups on both the nodes are getting dismounted. One site should survive, because:
+ two copies of the vote files available
+ one failure group was available to a node
+ interconnect was fine
RDBMS2 alert.log
----------------
CKPT (ospid: 2850): terminating the instance due to error 221
Mon Dec 09 13:42:47 2013
RDBMS1 alert.log
----------------
Mon Dec 09 13:43:23 2013
System state dump requested by (instance=1, osid=4991 (CKPT)),
summary=[abnormal instance termination].
ASM1 alert.log
--------------
>>>>> After the storage link is broken, initially the inaccessible disks were
taken offline, that is expected, but then
Mon Dec 09 13:43:20 2013
ERROR: no read quorum in group: required 2, found 0 disks
ERROR: Could not read PST for grp 2. Force dismounting the disk group.
Mon Dec 09 13:43:20 2013
WARNING: dirty detached from domain 2
NOTE: cache dismounted group 2/0x71E81044 (FLASH_DB67043)
SQL> alter diskgroup FLASH_DB67043 dismount force /* ASM SERVER:1911033924 */
Mon Dec 09 13:43:23 2013
NOTE: ASM client s100sb5l1:S100SB5L disconnected unexpectedly.
>>>>> later it was mounted ok
Mon Dec 09 13:44:11 2013
NOTE: LGWR attempting to mount thread 1 for diskgroup 2 (FLASH_DB67043)
NOTE: LGWR found thread 1 closed at ABA 10.10223
NOTE: LGWR mounted thread 1 for diskgroup 2 (FLASH_DB67043)
NOTE: LGWR opening thread 1 at fcn 0.524679 ABA 11.10224
NOTE: cache mounting group 2/0xC228151D (FLASH_DB67043) succeeded
NOTE: cache ending mount (success) of group FLASH_DB67043 number=2
incarn=0xc228151d
NOTE: Instance updated compatible.asm to 11.2.0.0.0 for grp 2
SUCCESS: diskgroup FLASH_DB67043 was mounted
>>>>> ASM1 instance crash
Mon Dec 09 13:45:24 2013
Received an instance abort message from instance 2Received an instance abort
message from instance 2
Please check instance 2 alert and LMON trace files for detail.Please check
instance 2 alert and LMON trace files for detail.
LMS0 (ospid: 17379): terminating the instance due to error 481
Instance terminated by LMS0, pid = 17379
>>>>> Other two diskgroups DATA and OCR_VOTE were not dismounted
ASM2 alert.log
--------------
>>>>> After the storage link is broken, initially the inaccessible disks were
taken offline, that is expected, but then
Mon Dec 09 13:42:43 2013
Mon Dec 09 13:42:45 2013
NOTE: cache dismounting (not clean) group 1/0xEC1528D2 (DATA_DB67043)
GMON updating disk modes for group 1 at 11 for pid 27, osid 9957
NOTE: messaging CKPT to quiesce pins Unix process pid: 9963, image:
oracle@D100STUL0702 (B000)
ERROR: no read quorum in group: required 2, found 0 disks
Mon Dec 09 13:42:46 2013
WARNING: dirty detached from domain 1
NOTE: cache dismounted group 1/0xEC1528D2 (DATA_DB67043)
SQL> alter diskgroup DATA_DB67043 dismount force /* ASM SERVER:3960809682 */
Mon Dec 09 13:42:47 2013
NOTE: cache dismounted group 2/0x71E528D3 (FLASH_DB67043)
SQL> alter diskgroup FLASH_DB67043 dismount force /* ASM SERVER:1910843603 */
ERROR: Could not read PST for grp 2. Force dismounting the disk group.
Mon Dec 09 13:42:48 2013
NOTE: cache dismounting (not clean) group 3/0x14C528D4 (OCR_VOT)
NOTE: messaging CKPT to quiesce pins Unix process pid: 10211, image:
oracle@D100STUL0702 (B002)
NOTE: halting all I/Os to diskgroup 3 (OCR_VOT)
ERROR: no read quorum in group: required 2, found 0 disks
>>>>> All the 3 diskgroups were dismounted
That means even with normal redundancy with storage spilt diskgroup is dismounting.
Cause
To view full details, sign in with your My Oracle Support account. |
|
Don't have a My Oracle Support account? Click to get started! |
In this Document
Symptoms |
Cause |
Solution |
References |