CRS startup failed with socket file permission issue
(Doc ID 2489770.1)
Last updated on FEBRUARY 25, 2022
Applies to:
Exadata X7-2 Hardware - Version All Versions and laterInformation in this document applies to any platform.
Symptoms
CRS on first two of the three compute nodes went down. Not able to start.
++ Errors in the alert log file.
2018-12-28 13:24:59.449 [ORAAGENT(55344)]CRS-8500: Oracle Clusterware ORAAGENT process is starting with operating system process ID 55344
2018-12-28 13:24:59.499 [ORAAGENT(55344)]CRS-5823: Could not initialize agent framework. Details at (:CRSAGF00120:) in /u01/app/oracle/diag/crs/####DM02/crs/trace/ohasd_oraagent_oracle.trc.
2018-12-28 13:28:28.735 [OHASD(39826)]CRS-5828: Could not start agent '/u01/app/12.2.0.1/grid/bin/oraagent_oracle'. Details at (:CRSAGF00130:) {0:0:2} in /u01/app/oracle/diag/crs/####DM02/crs/trace/ohasd.trc.
2018-12-28 13:28:28.908 [ORAAGENT(87341)]CRS-8500: Oracle Clusterware ORAAGENT process is starting with operating system process ID 87341
2018-12-28 13:28:28.914 [ORAAGENT(87341)]CRS-5823: Could not initialize agent framework. Details at (:CRSAGF00120:) in /u01/app/oracle/diag/crs/####DM02/crs/trace/ohasd_oraagent_oracle.trc.
2018-12-28 13:31:58.885 [OHASD(39826)]CRS-5828: Could not start agent '/u01/app/12.2.0.1/grid/bin/oraagent_oracle'. Details at (:CRSAGF00130:) {0:0:2} in /u01/app/oracle/diag/crs/####DM02/crs/trace/ohasd.trc.
2018-12-28 13:31:58.916 [ORAAGENT(100543)]CRS-8500: Oracle Clusterware ORAAGENT process is starting with operating system process ID 100543
++ Found IPC errors from the ohasd agent trace file.
2018-12-28 13:24:59.496 : CRSCOMM:4032603904: Ipc: sendWork thread started.
...
2018-12-28 13:24:59.496 : CRSCOMM:4235567168: Ipc: Starting send thread
2018-12-28 13:24:59.496 : CRSCOMM:4032603904: Ipc: sendWork thread started.
2018-12-28 13:24:59.499 : GIPCNET:4235567168: gipcmodNetworkProcessConnect: [network] failed connect attempt endp 0x2b7ecd0 [0000000000000019] { gipcEndpoint : localAddr 'clsc://(ADDRESS=(PROTOCOL=ipc)(KEY=)(GIPCID=00000000-00000000-0))', remoteAddr 'clsc://(ADDRESS=(PROTOCOL=ipc)(KEY=OHASD_IPC_SOCKET_11)(GIPCID=00000000-00000000-0))', numPend 0, numReady 1, numDone 0, numDead 0, numTransfer 0, objFlags 0x0, pidPeer 0, readyRef (nil), ready 0, wobj 0x2c112d0, sendp 0x2c11080 status 13flags 0xa1088712, flags-2 0x0, usrFlags 0x30000 }, req 0x2b80850 [0000000000000024] { gipcConnectRequest : addr 'clsc://(ADDRESS=(PROTOCOL=ipc)(KEY=OHASD_IPC_SOCKET_11)(GIPCID=00000000-00000000-0))', parentEndp 0x2b7ecd0, ret gipcretPermissions (9), objFlags 0x0, reqFlags 0x2 }
2018-12-28 13:24:59.499 : GIPCNET:4235567168: gipcmodNetworkProcessConnect: slos op : sgipcnDSConnectHelper
2018-12-28 13:24:59.499 : GIPCNET:4235567168: gipcmodNetworkProcessConnect: slos dep : Permission denied (13)
2018-12-28 13:24:59.499 : GIPCNET:4235567168: gipcmodNetworkProcessConnect: slos loc : connect
2018-12-28 13:24:59.499 : GIPCNET:4235567168: gipcmodNetworkProcessConnect: slos info: failed to /var/tmp/.oracle/sOHASD_IPC_SOCKET_11
2018-12-28 13:24:59.499 :GIPCXCPT:4235567168: gipcInternalConnectSync: failed sync request, ret gipcretPermissions (9)
2018-12-28 13:24:59.499 :GIPCXCPT:4235567168: gipcConnectSyncF [connectToServer : clsIpcClient.cpp : 380]: EXCEPTION[ ret gipcretPermissions (9) ] failed sync connect endp 0x2b7ecd0 [0000000000000019] { gipcEndpoint : localAddr 'clsc://(ADDRESS=(PROTOCOL=ipc)(KEY=)(GIPCID=00000000-00000000-0))', remoteAddr 'clsc://(ADDRESS=(PROTOCOL=ipc)(KEY=OHASD_IPC_SOCKET_11)(GIPCID=00000000-00000000-0))', numPend 0, numReady 0, numDone 0, numDead 0, numTransfer 0, objFlags 0x0, pidPeer 0, readyRef (nil), ready 0, wobj 0x2c112d0, sendp 0x2c11080 status 13flags 0xa108871a, flags-2 0x0, usrFlags 0x30000 }, addr 0x2b800e0 [0000000000000020] { gipcAddress : name 'clsc://(ADDRESS=(PROTOCOL=ipc)(KEY=OHASD_IPC_SOCKET_11)(GIPCID=00000000-00000000-0))', objFlags 0x0, addrFlags 0x4 }, flags 0x0
2018-12-28 13:24:59.499 : CRSCOMM:4235567168: IpcC: gipcConnect() failed, rc= 9
2018-12-28 13:24:59.499 : CRSCOMM:4235567168: [FFAIL] IpcC: Could not connect to (ADDRESS=(PROTOCOL=IPC)(KEY=OHASD_IPC_SOCKET_11)) ret = 9
2018-12-28 13:24:59.499 :CLSFRAME:4235567168: Failure at IPC connect to server:2
2018-12-28 13:24:59.499 :CLSFRAME:4235567168: Unable to start module-to-module comms: 1
2018-12-28 13:24:59.506 : AGENT:4235567168: Created alert : (:CRSAGF00120:) : Agent Framework failed to start:1
2018-12-28 13:24:59.506 : AGENT:4235567168: Agfw calling user exitCB, will exit on return
2018-12-28 13:24:59.506 : AGENT:4235567168: returned from user exitCB, exiting
2018-12-28 13:24:59.506 : AGENT:4235567168: Agent is exiting with exit code: 1
Trace file /u01/app/oracle/diag/crs/####DM02/crs/trace/ohasd_oraagent_oracle.trc
Changes
From problem node(node 2)
===========================
Owner of the .oracle folder:
----------------------------------------------
[root@####DM02 tmp]# date
Fri Dec 28 15:30:19 +03 2018
[root@####DM02 tmp]# ls -ld .oracle
drwxrwxrwt 2 root root 4096 Dec 28 15:25 .oracle
[root@####DM02 tmp]#
Owner for the var folder
--------------------------------------
[root@####DM02 /]# ls -ld var
drwx------ 18 1000 1000 4096 Dec 27 14:39 var
[root@####DM02 /]# id root
uid=0(root) gid=0(root) groups=11141(sapinst),0(root)
Comparing with the working node(noe 3), it looks like there are missing files in node 2. And there is permission mismatch for the .oracle and var folders.
Working Node(Node 3)
========================
Owner of .oracle
------------------------------
[root@####dm03 tmp]# date
Fri Dec 28 15:30:59 +03 2018
[root@####dm03 tmp]# ls -ld .oracle
drwxrwxrwt 2 root oinstall 4096 Dec 28 14:24 .oracle
owner of var folder
----------------------------
[root@####dm03 var]# cd ..
[root@####dm03 /]# ls -ld var
drwxr-xr-x 18 root root 4096 Jul 31 11:52 var
[root@####dm03 /]# id root
uid=0(root) gid=0(root) groups=11141(sapinst),0(root)
Cause
To view full details, sign in with your My Oracle Support account. |
|
Don't have a My Oracle Support account? Click to get started! |