My Oracle Support Banner

Storage Split Caused Crash of ASM Diskgroup With Normal Redundancy (Doc ID 1623400.1)

Last updated on AUGUST 15, 2023

Applies to:

Oracle Database - Enterprise Edition - Version 11.2.0.3 to 12.1.0.1 [Release 11.2 to 12.1]
Oracle Database Cloud Schema Service - Version N/A and later
Gen 1 Exadata Cloud at Customer (Oracle Exadata Database Cloud Machine) - Version N/A and later
Oracle Database Exadata Express Cloud Service - Version N/A and later
Oracle Cloud Infrastructure - Database Service - Version N/A and later
Information in this document applies to any platform.

Symptoms

2-Node, 11.2.0.4 RAC with ASM. Normal redundancy diskgroups. When the storage link is broken, so that Node-A can not access Storage-B and Node-b can not access Storage-A, "both" the instances are affected.

We see diskgroups on both the nodes are getting dismounted. One site should survive, because:

+ two copies of the vote files available
+ one failure group was available to a node
+ interconnect was fine

RDBMS2 alert.log
----------------

CKPT (ospid: 2850): terminating the instance due to error 221
Mon Dec 09 13:42:47 2013

RDBMS1 alert.log
----------------

Mon Dec 09 13:43:23 2013
System state dump requested by (instance=1, osid=4991 (CKPT)),
summary=[abnormal instance termination].


ASM1 alert.log
--------------

>>>>> After the storage link is broken, initially the inaccessible disks were
taken offline, that is expected, but then

Mon Dec 09 13:43:20 2013
ERROR: no read quorum in group: required 2, found 0 disks
ERROR: Could not read PST for grp 2. Force dismounting the disk group.

Mon Dec 09 13:43:20 2013
WARNING: dirty detached from domain 2
NOTE: cache dismounted group 2/0x71E81044 (FLASH_DB67043)
SQL> alter diskgroup FLASH_DB67043 dismount force /* ASM SERVER:1911033924 */

Mon Dec 09 13:43:23 2013
NOTE: ASM client s100sb5l1:S100SB5L disconnected unexpectedly.

>>>>> later it was mounted ok

Mon Dec 09 13:44:11 2013
NOTE: LGWR attempting to mount thread 1 for diskgroup 2 (FLASH_DB67043)
NOTE: LGWR found thread 1 closed at ABA 10.10223
NOTE: LGWR mounted thread 1 for diskgroup 2 (FLASH_DB67043)
NOTE: LGWR opening thread 1 at fcn 0.524679 ABA 11.10224
NOTE: cache mounting group 2/0xC228151D (FLASH_DB67043) succeeded
NOTE: cache ending mount (success) of group FLASH_DB67043 number=2
incarn=0xc228151d
NOTE: Instance updated compatible.asm to 11.2.0.0.0 for grp 2
SUCCESS: diskgroup FLASH_DB67043 was mounted

>>>>> ASM1 instance crash

Mon Dec 09 13:45:24 2013
Received an instance abort message from instance 2Received an instance abort
message from instance 2

Please check instance 2 alert and LMON trace files for detail.Please check
instance 2 alert and LMON trace files for detail.

LMS0 (ospid: 17379): terminating the instance due to error 481
Instance terminated by LMS0, pid = 17379

>>>>> Other two diskgroups DATA and OCR_VOTE were not dismounted

ASM2 alert.log
--------------

>>>>> After the storage link is broken, initially the inaccessible disks were
taken offline, that is expected, but then

Mon Dec 09 13:42:43 2013

Mon Dec 09 13:42:45 2013
NOTE: cache dismounting (not clean) group 1/0xEC1528D2 (DATA_DB67043)
GMON updating disk modes for group 1 at 11 for pid 27, osid 9957
NOTE: messaging CKPT to quiesce pins Unix process pid: 9963, image:
oracle@D100STUL0702 (B000)
ERROR: no read quorum in group: required 2, found 0 disks

Mon Dec 09 13:42:46 2013
WARNING: dirty detached from domain 1
NOTE: cache dismounted group 1/0xEC1528D2 (DATA_DB67043)
SQL> alter diskgroup DATA_DB67043 dismount force /* ASM SERVER:3960809682 */


Mon Dec 09 13:42:47 2013

NOTE: cache dismounted group 2/0x71E528D3 (FLASH_DB67043)
SQL> alter diskgroup FLASH_DB67043 dismount force /* ASM SERVER:1910843603 */

ERROR: Could not read PST for grp 2. Force dismounting the disk group.

Mon Dec 09 13:42:48 2013
NOTE: cache dismounting (not clean) group 3/0x14C528D4 (OCR_VOT)
NOTE: messaging CKPT to quiesce pins Unix process pid: 10211, image:
oracle@D100STUL0702 (B002)
NOTE: halting all I/Os to diskgroup 3 (OCR_VOT)
ERROR: no read quorum in group: required 2, found 0 disks

>>>>> All the 3 diskgroups were dismounted

That means even with normal redundancy with storage spilt diskgroup is dismounting.

Cause

To view full details, sign in with your My Oracle Support account.

Don't have a My Oracle Support account? Click to get started!


In this Document
Symptoms
Cause
Solution
References

My Oracle Support provides customers with access to over a million knowledge articles and a vibrant support community of peers and Oracle experts.