Exadata: Failed Bundle patch/patchset apply leaves clusterware unable to start

(Doc ID 1498408.1)

Last updated on JUNE 06, 2013

Applies to:

Exadata Database Machine X2-2 Full Rack - Version All Versions to All Versions [Release All Releases]
Information in this document applies to any platform.

Symptoms

Gird home versions 11.2.X

+ A failed bundle patch apply leaves clusterware unable to start.

+ Reviewing the opatch logs for the failed apply, we see that the automatic rollback of the patch also failed

+ Tried a manual rollback of the bundle patch(using the BP readme) and still the clusterware fails to start

+ Checking the status of clusterware  with "crsctl stat res -t -init" shows the following resource states:

ora.asm      OFFLINE
ora.crsd     OFFLINE
ora.diskmon  OFFLINE

+ Reviewing the ASM alert log shows the following errors during startup- while trying to mount the diskgroup.

-----------------------------
SQL> ALTER DISKGROUP ALL MOUNT /* asm agent call crs *//* {0:0:2} */
NOTE: Diskgroups listed in ASM_DISKGROUPS are
      DATA_PT01
      RECO_PT01
NOTE: Diskgroup used for Voting files is:
      DBFS_DG
Diskgroup with spfile:DBFS_DG
Diskgroup used for OCR is:DBFS_DG
NOTE: cache registered group DATA_PT01 number=1 incarn=0x1e8993b7

.

.
DSKM process appears to be hung. Initiating system state dump.
Thu Oct 11 23:13:37 2012
System state dump requested by (instance=1, osid=20817 (GEN0)), summary=[system state dump request (ksz_check_ds)].

.
Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_dskm_20823.trc:
ORA-56867: Cannot connect to Master Diskmon on pipe "default pipe"
ORA-27300: OS system dependent operation:connect failed with status: 111
ORA-27301: OS failure message: Connection refused
ORA-27302: failure occurred at: skgznpcon6
DSKM (ospid: 20823): terminating the instance due to error 56867

----------------------

 

+ Additional symptoms observed from other reported issues ( which may or may not be present)

 o  Core files generated in the "$GRID_HOME/log/<hostname>/diskmon" due to diskmon crash

 o "crsctl start res ora.diskmon -init" might bring the resource online, but sets it back to offline as soon as ASM tried to start.

 o  The size or checksum  reported by   "cksum $GRID_HOME/bin/diskmon.bin" might be different across the nodes.

 o Connectivity issues reported by DISKMON log, aborting the clusterware startup

eg: 2013-05-17 19:32:02.860: [ DISKMON][20792:1105443136] dskm_new_ossb10: oss_open for device o/10.217.206.41 (inc 0, ossbp 0x2aaaac013e60) failed with error 2

 

Changes

Issue started after a failed Bundle patch apply on the grid home

-OR-

Issue started after a failed upgrade/patchset application

Cause

Sign In with your My Oracle Support account

Don't have a My Oracle Support account? Click to get started

My Oracle Support provides customers with access to over a
Million Knowledge Articles and hundreds of Community platforms