My Oracle Support Banner

Corosync Crashes with Error "abrt-hook-ccpp: Process 21424 (corosync) of user 0 killed by SIGBUS - dumping core" in Syslog (Doc ID 2527123.1)

Last updated on APRIL 24, 2020

Applies to:

Linux OS - Version Oracle Linux 7.3 to Oracle Linux 7.4 [Release OL7U3 to OL7U4]
Linux x86-64

Symptoms

A Corosync/Pacemaker node might be evicted out of the running cluster with below syslog messages on the fenced node and Designated Controller node of the cluster, 

Mar 31 23:10:51 [21431] <> pacemakerd: info: mcp_quorum_destroy: connection lost
Mar 31 23:10:51 [21431] <> pacemakerd: error: pcmk_cpg_dispatch: Connection to the CPG API failed: Library error (2)
Mar 31 23:10:51 [21437] <> crmd: error: crmd_quorum_destroy: connection terminated
Mar 31 23:10:51 [21433] <> stonith-ng: error: pcmk_cpg_dispatch: Connection to the CPG API failed: Library error (2)
Mar 31 23:10:51 [21431] <> pacemakerd: error: mcp_cpg_destroy: Connection destroyed
Mar 31 23:10:51 [21437] <> crmd: info: qb_ipcs_us_withdraw: withdrawing server sockets
Mar 31 23:10:51 [21431] <> pacemakerd: info: crm_xml_cleanup: Cleaning up memory from libxml2
Mar 31 23:10:51 [21433] <> stonith-ng: error: stonith_peer_cs_destroy: Corosync connection terminated
Mar 31 23:10:51 [21433] <> stonith-ng: info: stonith_shutdown: Terminating with 2 clients
Mar 31 23:10:51 [21432] <> cib: error: pcmk_cpg_dispatch: Connection to the CPG API failed: Library error (2)
Mar 31 23:10:51 [21435] <> attrd: error: pcmk_cpg_dispatch: Connection to the CPG API failed: Library error (2)
Mar 31 23:10:51 [21432] <> cib: error: cib_cs_destroy: Corosync connection lost! Exiting.
Mar 31 23:10:51 [21432] <> cib: info: terminate_cib: cib_cs_destroy: Exiting fast...
Mar 31 23:10:51 [21433] <> stonith-ng: info: cib_connection_destroy: Connection to the CIB closed.
Mar 31 23:10:51 [21435] <> attrd: crit: attrd_cpg_destroy: Lost connection to Corosync service!
Mar 31 23:10:51 [21435] <> attrd: info: attrd_shutdown: Shutting down
Mar 31 23:10:51 [21437] <> crmd: info: tengine_stonith_connection_destroy: Fencing daemon disconnected
Mar 31 23:10:51 [21435] <> attrd: info: main: Shutting down attribute manager
Mar 31 23:10:51 [21437] <> crmd: notice: crmd_exit: Forcing immediate exit with status 67: Link has been severed
Mar 31 23:10:51 [21437] <> crmd: info: crm_xml_cleanup: Cleaning up memory from libxml2
Mar 31 23:10:51 [21435] <> attrd: notice: crm_client_disconnect_all: Disconnecting client 0x5567c6ce44f0, pid=21437...
Mar 31 23:10:51 [21435] <> attrd: info: qb_ipcs_us_withdraw: withdrawing server sockets
Mar 31 23:10:51 [21433] <> stonith-ng: info: qb_ipcs_us_withdraw: withdrawing server sockets
Mar 31 23:10:51 [21435] <> attrd: info: attrd_cib_destroy_cb: Connection disconnection complete
Mar 31 23:10:51 [21434] <> lrmd: error: crm_ipc_read: Connection to stonith-ng failed
Mar 31 23:10:51 [21434] <> lrmd: error: mainloop_gio_callback: Connection to stonith-ng[0x55e83b2f1350] closed (I/O condition=17)
Mar 31 23:10:51 [21435] <> attrd: info: crm_xml_cleanup: Cleaning up memory from libxml2
Mar 31 23:10:51 [21432] <> cib: info: qb_ipcs_us_withdraw: withdrawing server sockets
Mar 31 23:10:51 [21434] <> lrmd: error: stonith_connection_destroy_cb: LRMD lost STONITH connection
Mar 31 23:10:51 [21434] <> lrmd: error: stonith_connection_failed: STONITH connection failed, finalizing 1 pending operations.
Mar 31 23:10:51 [21432] <> cib: info: qb_ipcs_us_withdraw: withdrawing server sockets
Mar 31 23:10:51 [21433] <> stonith-ng: info: main: Done
Mar 31 23:10:51 [21433] <> stonith-ng: info: crm_xml_cleanup: Cleaning up memory from libxml2
Mar 31 23:10:51 [21432] <> cib: info: qb_ipcs_us_withdraw: withdrawing server sockets
Mar 31 23:10:51 [21432] <> cib: info: crm_xml_cleanup: Cleaning up memory from libxml2
Mar 31 23:10:51 [21434] <> lrmd: info: cancel_recurring_action: Cancelling ocf operation share2vg_monitor_60000
Mar 31 23:10:51 [21434] <> lrmd: warning: qb_ipcs_event_sendv: new_event_notification (21434-21437-8): Bad file descriptor (9)
Mar 31 23:10:51 [21434] <> lrmd: warning: send_client_notify: Notification of client crmd/873e9550-184d-4e44-979e-666285f30251 failed

...

Mar 31 23:10:51 <> abrt-hook-ccpp: Process 21424 (corosync) of user 0 killed by SIGBUS - dumping core

Cluster Designated Controller(DC) may have logs similar to below,

Mar 31 23:10:51 <DC: <HOSTNAME2>> corosync[15007]: [TOTEM ] A new membership (<IP address>:24) was formed. Members joined: 2 left: 2
Mar 31 23:10:51 <DC: <HOSTNAME2>> corosync[15007]: [TOTEM ] Failed to receive the leave message. failed: 2
Mar 31 23:10:51 <DC: <HOSTNAME2>> stonith-ng[15016]: notice: Node <> state is now lost
Mar 31 23:10:51 <DC: <HOSTNAME2>> attrd[15018]: notice: Node <> state is now lost
Mar 31 23:10:51 <DC: <HOSTNAME2>> attrd[15018]: notice: Removing all <> attributes for peer loss
Mar 31 23:10:51 <DC: <HOSTNAME2>> attrd[15018]: notice: Purged 1 peers with id=2 and/or uname=<> from the membership cache
Mar 31 23:10:51 <DC: <HOSTNAME2>> stonith-ng[15016]: notice: Purged 1 peers with id=2 and/or uname=<> from the membership cache
Mar 31 23:10:51 <DC: <HOSTNAME2>> corosync[15007]: [QUORUM] Members[4]: 1 2 3 4
Mar 31 23:10:51 <DC: <HOSTNAME2>> corosync[15007]: [MAIN ] Completed service synchronization, ready to provide service.
Mar 31 23:10:51 <DC: <HOSTNAME2>> crmd[15020]: warning: No reason to expect <HOSTNAME1> to be down
Mar 31 23:10:51 <DC: <HOSTNAME2>> crmd[15020]: notice: Stonith/shutdown of <> not matched
Mar 31 23:10:51 <DC: <HOSTNAME2>> cib[15015]: notice: Node <> state is now lost
Mar 31 23:10:51 <DC: <HOSTNAME2>> crmd[15020]: notice: State transition S_IDLE -> S_POLICY_ENGINE
Mar 31 23:10:51 <DC: <HOSTNAME2>> cib[15015]: notice: Purged 1 peers with id=2 and/or uname=<> from the membership cache
Mar 31 23:10:52 <DC: <HOSTNAME2>> attrd[15018]: notice: Node <> state is now member
Mar 31 23:10:52 <DC: <HOSTNAME2>> stonith-ng[15016]: notice: Node <> state is now member
Mar 31 23:10:52 <DC: <HOSTNAME2>> cib[15015]: notice: Node <> state is now member
Mar 31 23:10:53 <DC: <HOSTNAME2>> pengine[15019]: warning: Cluster node <> will be fenced: peer process is no longer available
Mar 31 23:10:53 <DC: <HOSTNAME2>> pengine[15019]: warning: Node <> is unclean
Mar 31 23:10:53 <DC: <HOSTNAME2>> pengine[15019]: warning: Scheduling Node <> for STONITH
Mar 31 23:10:53 <DC: <HOSTNAME2>> pengine[15019]: notice: * Fence (reboot) <> 'peer process is no longer available'
Mar 31 23:10:53 <DC: <HOSTNAME2>> pengine[15019]: notice: * Move iLO_<DC: <HOSTNAME2>> ( <> -> <DC: <HOSTNAME2>> )
Mar 31 23:10:53 <DC: <HOSTNAME2>> pengine[15019]: notice: * Move share2vg ( <> -> <node 3> )
Mar 31 23:10:53 <DC: <HOSTNAME2>> pengine[15019]: notice: * Move share2_data ( <> -> <node 3> )
Mar 31 23:10:53 <DC: <HOSTNAME2>> pengine[15019]: notice: * Move share2_log ( <> -> <node 3> )
Mar 31 23:10:53 <DC: <HOSTNAME2>> pengine[15019]: notice: * Move <>_vip ( <> -> <node 3> )
Mar 31 23:10:53 <DC: <HOSTNAME2>> pengine[15019]: notice: * Move cern_apps_<> ( <> -> <node 3> )
Mar 31 23:10:53 <DC: <HOSTNAME2>> pengine[15019]: warning: Calculated transition 27757 (with warnings), saving inputs in /var/lib/pacemaker/pengine/pe-warn-0.bz2
Mar 31 23:10:53 <DC: <HOSTNAME2>> crmd[15020]: notice: Requesting fencing (reboot) of node <>
Mar 31 23:10:53 <DC: <HOSTNAME2>> crmd[15020]: notice: Initiating start operation iLO_<DC: <HOSTNAME2>>_start_0 locally on <DC: <HOSTNAME2>>
Mar 31 23:10:53 <DC: <HOSTNAME2>> stonith-ng[15016]: notice: Client crmd.15020.9cb89ebf wants to fence (reboot) '<>' with device '(any)'
Mar 31 23:10:53 <DC: <HOSTNAME2>> stonith-ng[15016]: notice: Requesting peer fencing (reboot) of <>
Mar 31 23:10:53 <DC: <HOSTNAME2>> stonith-ng[15016]: notice: iLO_<DC: <HOSTNAME2>> can not fence (reboot) <>: static-list
Mar 31 23:10:53 <DC: <HOSTNAME2>> stonith-ng[15016]: notice: iLO_<> can fence (reboot) <>: static-list
Mar 31 23:10:53 <DC: <HOSTNAME2>> stonith-ng[15016]: notice: iLO_<node 3> can not fence (reboot) <>: static-list
Mar 31 23:10:53 <DC: <HOSTNAME2>> stonith-ng[15016]: notice: iLO_<node 4> can not fence (reboot) <>: static-list

 

Changes

No known changes has been made to system prior to this outage. 

Cause

To view full details, sign in with your My Oracle Support account.

Don't have a My Oracle Support account? Click to get started!


In this Document
Symptoms
Changes
Cause
Solution
References


My Oracle Support provides customers with access to over a million knowledge articles and a vibrant support community of peers and Oracle experts.