SuperCluster CRS RAC Node evictions caused by Golden Gate cache management
(Doc ID 2828011.1)
Last updated on JUNE 20, 2023
Applies to:
Oracle SuperCluster Specific Software - Version 3.x and laterInformation in this document applies to any platform.
Symptoms
Random node evictions in which several Oracle RAC grid process can fail to get on CPU.
► -|- Node eviction at time 11/02/2021 07:50~07:57
2021-11-02 03:37:21.422 [CLSECHO(6843)]ACFS-9294: updating file /etc/oracledrivers.conf
2021-11-02 03:37:22.736 [CLSECHO(6973)]ACFS-9316: Valid ADVM/ACFS distribution media detected at: '/u01/app/19.0.0.0/grid/usm/install/Solaris/5.11/sparcv9/bin'
2021-11-02 07:48:10.956 [CLSECHO(27454)]ACFS-9294: updating file /etc/oracledrivers.conf
2021-11-02 07:54:24.540 [OCSSD(22930)]CRS-7503: The Oracle Grid Infrastructure process 'ocssd' observed communication issues between node '<node2>' and node '<node1>', interface list of local node '<node2>' is '192.168.10.116:48755;', interface list of remote node '<node1>' is '192.168.10.115:57996;'.
2021-11-02 07:54:26.542 [OCSSD(22930)]CRS-7503: The Oracle Grid Infrastructure process 'ocssd' observed communication issues between node '<node2>' and node '<node1>', interface list of local node '<node2>' is '192.168.10.116:48755;', interface list of remote node '<node1>' is '192.168.10.115:57996;'.
2021-11-02 07:54:29.153 [OHASD(22428)]CRS-8011: reboot advisory message from host: <node1>, component: cssmonit, with time stamp: L-2021-11-02-07:54:29.152
2021-11-02 07:54:29.154 [OHASD(22428)]CRS-8013: reboot advisory message text: Rebooting node due to connection problems with CSS
2021-11-02 07:54:29.155 [OHASD(22428)]CRS-8011: reboot advisory message from host: <node1>, component: cssagent, with time stamp: L-2021-11-02-07:54:29.152
2021-11-02 07:54:29.156 [OHASD(22428)]CRS-8013: reboot advisory message text: Rebooting node due to connection problems with CSS
2021-11-02 07:54:44.830 [OCSSD(22930)]CRS-1612: Network communication with node <node1> (1) has been missing for 50% of the timeout interval. If this persists, removal of this node from cluster will occur in 29.724 seconds
2021-11-02 07:54:59.834 [OCSSD(22930)]CRS-1611: Network communication with node <node1> (1) has been missing for 75% of the timeout interval. If this persists, removal of this node from cluster will occur in 14.721 seconds
2021-11-02 07:55:08.837 [OCSSD(22930)]CRS-1610: Network communication with node <node1> (1) has been missing for 90% of the timeout interval. If this persists, removal of this node from cluster will occur in 5.718 seconds
2021-11-02 07:55:31.085 [OCSSD(22930)]CRS-1632: Node <node1> is being removed from the cluster in cluster incarnation 527963034
2021-11-02 07:55:34.115 [OCSSD(22930)]CRS-1601: CSSD Reconfiguration complete. Active nodes are <node2> .
2021-11-02 07:55:34.132 [CRSD(23635)]CRS-5504: Node down event reported for node '<node1>'.<<<<<<<<<<<<<<<<<<<2021-11-02 07:55:41.808 [CRSD(23635)]CRS-2773: Server '<node1>' has been removed from pool 'ora.ADRSBX02'.
2021-11-02 07:55:41.810 [CRSD(23635)]CRS-2773: Server '<node1>' has been removed from pool 'ora.ADRMNT01'.
2021-11-02 07:55:41.813 [CRSD(23635)]CRS-2773: Server '<node1>' has been removed from pool 'Generic'.
2021-11-02 07:55:58.326 [EVMD(22664)]CRS-7500: The Oracle Grid Infrastructure process 'evmd' failed to establish Oracle Grid Interprocess Communication (GIPC) high availability connection with remote node '<node1>'.
2021-11-02 07:56:00.940 [GIPCD(22848)]CRS-7506: Failed to establish bootstap connetion with node '<node1>'
2021-11-02 07:56:01.140 [CRSD(23635)]CRS-7500: The Oracle Grid Infrastructure process 'crsd' failed to establish Oracle Grid Interprocess Communication (GIPC) high availability connection with remote node '<node1>'.
2021-11-02 07:57:10.820 [OCSSD(22930)]CRS-1601: CSSD Reconfiguration complete. Active nodes are <node1> <node2> .<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
============
In one eviction case in diskmon log from the issue time we are seeing " dskm_node_guids_are_offline: Node's GUID's are not available, return FALSE"
Here is an example of the panic string:
.
Nov 2 07:54:25 <node1> genunix: [ID 603404 kern.notice] NOTICE: core_log: ocssd.bin[6331] core dumped: /var/share/cores/core_<node1>_ocssd.bin
Nov 2 07:54:25 <node1> genunix: [ID 209915 kern.notice] NOTICE: core_log: ocssd.bin[6331] no diagnostic core file pattern exists
Nov 2 07:54:29 <node1> genunix: [ID 457151 kern.warning] WARNING: Oracle RAC node (<node1>) eviction initiated at request of process id 6301 (cssdagent)
Nov 2 07:54:29 <node1> unix: [ID 836849 kern.notice]
Nov 2 07:54:29 <node1> ^Mpanic[cpu0]/thread=184006d5cefc0:
Nov 2 07:54:29 <node1> unix: [ID 493256 kern.notice] Oracle RAC node eviction
Nov 2 07:54:29 <node1> unix: [ID 100000 kern.notice]
Nov 2 07:54:29 <node1> genunix: [ID 723222 kern.notice] 000002a108bd9900 genunix:kadmin+688 (1, 1, 0, 10, 184006d141978, 0)
Nov 2 07:54:29 <node1> genunix: [ID 179002 kern.notice] %l0-3: 0000000000000004 00000000208d9000 0001840020085e18 0000000000000004
Nov 2 07:54:29 <node1> %l4-7: 0000000000000004 00000000000005d8 0000000000000004 00000000000000f8
Nov 2 07:54:29 <node1> genunix: [ID 723222 kern.notice] 000002a108bd99c0 genunix:uadmin+1bc (5, 1, 0, 203d1000, 18400629ec8f8, 4)
Nov 2 07:54:29 <node1> genunix: [ID 179002 kern.notice] %l0-3: 000184006ed60318 000184002009a000 0000000029e9a312 0000000000000000
Nov 2 07:54:29 <node1> %l4-7: 0000000029e9a312 0000000029e9a311 0000000000000000 0000000000000000
Nov 2 07:54:29 <node1> unix: [ID 100000 kern.notice]
Nov 2 07:54:29 <node1> genunix: [ID 672855 kern.notice] syncing file systems...
Nov 2 07:54:29 <node1> genunix: [ID 904073 kern.notice] done
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
Changes
Golden Gate installed using defaults and configured to backup / manage redo logs
Cause
To view full details, sign in with your My Oracle Support account. |
|
Don't have a My Oracle Support account? Click to get started! |
In this Document
Symptoms |
Changes |
Cause |
Solution |
References |