Cluster failed to start due to problem with socket pipe npohasd

(Doc ID 1612325.1)

Last updated on JANUARY 19, 2017

Applies to:

Oracle Database - Enterprise Edition - Version 11.2.0.1 and later
Information in this document applies to any platform.

Symptoms

CRS stack not coming up on one node.

Sockets permission issue with Grid Infrastructure and CRS stack fails to come up with crsctl start crs after the server reboot.

Init process is running fine after reboot :
   test-133(root)/>ps -ef|grep init
   root 28717 28382  2 19:03:20 pts/9     0:00 grep d.bin
   root     28756     1  0 10:01 ?        00:00:00 /bin/sh /etc/init.d/init.ohasd run   
   root 28676 27170  0 19:02:57 pts/5     0:00
 
Deleting files in /tmp didn't help :
   rm -rf /tmp/.oracle/* /usr/tmp/.oracle/* /var/tmp/.oracle/* 
 
CRS stack still won't start as shown below :
   test-133(root)/>/oracle/app/11.2.0/grid/bin/crsctl start crs
   test-133(root)/>ps -ef|grep d.bin
   root 28717 28382  2 19:03:20 pts/9     0:00 grep d.bin
   root 28680 0         19:02:57           0:00 /oracle/app/11.2.0/grid/bin/ohasd.bin reboot

For a fraction of a second ohasd.bin comes up and we can see one socket got created :
   ls -lrt /tmp/.oracle

    prw-r--r-- 1 root root 0 Jan 6 09:50 npohasd

 

Taking a strace/truss on ohasd.bin process, we find :
   tusc output -> seems stuck in sleeping.
   test-133:(root)/>/hpk/tusc -faep -T %H:%M:%S -p 28680
  ( Attached to process 28680 ("/oracle/app/11.2.0/grid/bin/ohasd.bin reboot") 
  [64             -bit] )
  19:03:42 [28680] open(0x40000000007789b0, O_WRONLY|0x800, 023240) [sleeping]
 
  tusc:
  ttrace(TT_PROC_STOP, 0, 0, 0, 0, 0): Permission denied
  

Changes

 Problem Started after patching failed and the server rebooted.

 

Cause

Sign In with your My Oracle Support account

Don't have a My Oracle Support account? Click to get started

My Oracle Support provides customers with access to over a
Million Knowledge Articles and hundreds of Community platforms