My Oracle Support Banner

Corosync Crashes with Error "abrt-hook-ccpp: Process 21424 (corosync) of user 0 killed by SIGBUS - dumping core" in Syslog (Doc ID 2527123.1)

Last updated on MAY 06, 2019

Applies to:

Linux OS - Version Oracle Linux 7.3 to Oracle Linux 7.4 [Release OL7U3 to OL7U4]
Linux x86-64

Symptoms

A Corosync/Pacemaker node might be evicted out of the running cluster with below syslog messages on the fenced node and Designated Controller node of the cluster, 

Mar 31 23:10:51 [21431] <node 2> pacemakerd: info: mcp_quorum_destroy: connection lost
Mar 31 23:10:51 [21431] <node 2> pacemakerd: error: pcmk_cpg_dispatch: Connection to the CPG API failed: Library error (2)
Mar 31 23:10:51 [21437] <node 2> crmd: error: crmd_quorum_destroy: connection terminated
Mar 31 23:10:51 [21433] <node 2> stonith-ng: error: pcmk_cpg_dispatch: Connection to the CPG API failed: Library error (2)
Mar 31 23:10:51 [21431] <node 2> pacemakerd: error: mcp_cpg_destroy: Connection destroyed
Mar 31 23:10:51 [21437] <node 2> crmd: info: qb_ipcs_us_withdraw: withdrawing server sockets
Mar 31 23:10:51 [21431] <node 2> pacemakerd: info: crm_xml_cleanup: Cleaning up memory from libxml2
Mar 31 23:10:51 [21433] <node 2> stonith-ng: error: stonith_peer_cs_destroy: Corosync connection terminated
Mar 31 23:10:51 [21433] <node 2> stonith-ng: info: stonith_shutdown: Terminating with 2 clients
Mar 31 23:10:51 [21432] <node 2> cib: error: pcmk_cpg_dispatch: Connection to the CPG API failed: Library error (2)
Mar 31 23:10:51 [21435] <node 2> attrd: error: pcmk_cpg_dispatch: Connection to the CPG API failed: Library error (2)
Mar 31 23:10:51 [21432] <node 2> cib: error: cib_cs_destroy: Corosync connection lost! Exiting.
Mar 31 23:10:51 [21432] <node 2> cib: info: terminate_cib: cib_cs_destroy: Exiting fast...
Mar 31 23:10:51 [21433] <node 2> stonith-ng: info: cib_connection_destroy: Connection to the CIB closed.
Mar 31 23:10:51 [21435] <node 2> attrd: crit: attrd_cpg_destroy: Lost connection to Corosync service!
Mar 31 23:10:51 [21435] <node 2> attrd: info: attrd_shutdown: Shutting down
Mar 31 23:10:51 [21437] <node 2> crmd: info: tengine_stonith_connection_destroy: Fencing daemon disconnected
Mar 31 23:10:51 [21435] <node 2> attrd: info: main: Shutting down attribute manager
Mar 31 23:10:51 [21437] <node 2> crmd: notice: crmd_exit: Forcing immediate exit with status 67: Link has been severed
Mar 31 23:10:51 [21437] <node 2> crmd: info: crm_xml_cleanup: Cleaning up memory from libxml2
Mar 31 23:10:51 [21435] <node 2> attrd: notice: crm_client_disconnect_all: Disconnecting client 0x5567c6ce44f0, pid=21437...
Mar 31 23:10:51 [21435] <node 2> attrd: info: qb_ipcs_us_withdraw: withdrawing server sockets
Mar 31 23:10:51 [21433] <node 2> stonith-ng: info: qb_ipcs_us_withdraw: withdrawing server sockets
Mar 31 23:10:51 [21435] <node 2> attrd: info: attrd_cib_destroy_cb: Connection disconnection complete
Mar 31 23:10:51 [21434] <node 2> lrmd: error: crm_ipc_read: Connection to stonith-ng failed
Mar 31 23:10:51 [21434] <node 2> lrmd: error: mainloop_gio_callback: Connection to stonith-ng[0x55e83b2f1350] closed (I/O condition=17)
Mar 31 23:10:51 [21435] <node 2> attrd: info: crm_xml_cleanup: Cleaning up memory from libxml2
Mar 31 23:10:51 [21432] <node 2> cib: info: qb_ipcs_us_withdraw: withdrawing server sockets
Mar 31 23:10:51 [21434] <node 2> lrmd: error: stonith_connection_destroy_cb: LRMD lost STONITH connection
Mar 31 23:10:51 [21434] <node 2> lrmd: error: stonith_connection_failed: STONITH connection failed, finalizing 1 pending operations.
Mar 31 23:10:51 [21432] <node 2> cib: info: qb_ipcs_us_withdraw: withdrawing server sockets
Mar 31 23:10:51 [21433] <node 2> stonith-ng: info: main: Done
Mar 31 23:10:51 [21433] <node 2> stonith-ng: info: crm_xml_cleanup: Cleaning up memory from libxml2
Mar 31 23:10:51 [21432] <node 2> cib: info: qb_ipcs_us_withdraw: withdrawing server sockets
Mar 31 23:10:51 [21432] <node 2> cib: info: crm_xml_cleanup: Cleaning up memory from libxml2
Mar 31 23:10:51 [21434] <node 2> lrmd: info: cancel_recurring_action: Cancelling ocf operation share2vg_monitor_60000
Mar 31 23:10:51 [21434] <node 2> lrmd: warning: qb_ipcs_event_sendv: new_event_notification (21434-21437-8): Bad file descriptor (9)
Mar 31 23:10:51 [21434] <node 2> lrmd: warning: send_client_notify: Notification of client crmd/873e9550-184d-4e44-979e-666285f30251 failed

...

Mar 31 23:10:51 <node 2> abrt-hook-ccpp: Process 21424 (corosync) of user 0 killed by SIGBUS - dumping core

 

Cluster Designated Controller(DC) may have logs similar to below, 

Mar 31 23:10:51 <DC: node 1> corosync[15007]: [TOTEM ] A new membership (<IP address>:24) was formed. Members joined: 2 left: 2
Mar 31 23:10:51 <DC: node 1> corosync[15007]: [TOTEM ] Failed to receive the leave message. failed: 2
Mar 31 23:10:51 <DC: node 1> stonith-ng[15016]: notice: Node <node 2> state is now lost
Mar 31 23:10:51 <DC: node 1> attrd[15018]: notice: Node <node 2> state is now lost
Mar 31 23:10:51 <DC: node 1> attrd[15018]: notice: Removing all <node 2> attributes for peer loss
Mar 31 23:10:51 <DC: node 1> attrd[15018]: notice: Purged 1 peers with id=2 and/or uname=<node 2> from the membership cache
Mar 31 23:10:51 <DC: node 1> stonith-ng[15016]: notice: Purged 1 peers with id=2 and/or uname=<node 2> from the membership cache
Mar 31 23:10:51 <DC: node 1> corosync[15007]: [QUORUM] Members[4]: 1 2 3 4
Mar 31 23:10:51 <DC: node 1> corosync[15007]: [MAIN ] Completed service synchronization, ready to provide service.
Mar 31 23:10:51 <DC: node 1> crmd[15020]: warning: No reason to expect node 2 to be down
Mar 31 23:10:51 <DC: node 1> crmd[15020]: notice: Stonith/shutdown of <node 2> not matched
Mar 31 23:10:51 <DC: node 1> cib[15015]: notice: Node <node 2> state is now lost
Mar 31 23:10:51 <DC: node 1> crmd[15020]: notice: State transition S_IDLE -> S_POLICY_ENGINE
Mar 31 23:10:51 <DC: node 1> cib[15015]: notice: Purged 1 peers with id=2 and/or uname=<node 2> from the membership cache
Mar 31 23:10:52 <DC: node 1> attrd[15018]: notice: Node <node 2> state is now member
Mar 31 23:10:52 <DC: node 1> stonith-ng[15016]: notice: Node <node 2> state is now member
Mar 31 23:10:52 <DC: node 1> cib[15015]: notice: Node <node 2> state is now member
Mar 31 23:10:53 <DC: node 1> pengine[15019]: warning: Cluster node <node 2> will be fenced: peer process is no longer available
Mar 31 23:10:53 <DC: node 1> pengine[15019]: warning: Node <node 2> is unclean
Mar 31 23:10:53 <DC: node 1> pengine[15019]: warning: Scheduling Node <node 2> for STONITH
Mar 31 23:10:53 <DC: node 1> pengine[15019]: notice: * Fence (reboot) <node 2> 'peer process is no longer available'
Mar 31 23:10:53 <DC: node 1> pengine[15019]: notice: * Move iLO_<DC: node 1> ( <node 2> -> <DC: node 1> )
Mar 31 23:10:53 <DC: node 1> pengine[15019]: notice: * Move share2vg ( <node 2> -> <node 3> )
Mar 31 23:10:53 <DC: node 1> pengine[15019]: notice: * Move share2_data ( <node 2> -> <node 3> )
Mar 31 23:10:53 <DC: node 1> pengine[15019]: notice: * Move share2_log ( <node 2> -> <node 3> )
Mar 31 23:10:53 <DC: node 1> pengine[15019]: notice: * Move <node 2>_vip ( <node 2> -> <node 3> )
Mar 31 23:10:53 <DC: node 1> pengine[15019]: notice: * Move cern_apps_<node 2> ( <node 2> -> <node 3> )
Mar 31 23:10:53 <DC: node 1> pengine[15019]: warning: Calculated transition 27757 (with warnings), saving inputs in /var/lib/pacemaker/pengine/pe-warn-0.bz2
Mar 31 23:10:53 <DC: node 1> crmd[15020]: notice: Requesting fencing (reboot) of node <node 2>
Mar 31 23:10:53 <DC: node 1> crmd[15020]: notice: Initiating start operation iLO_<DC: node 1>_start_0 locally on <DC: node 1>
Mar 31 23:10:53 <DC: node 1> stonith-ng[15016]: notice: Client crmd.15020.9cb89ebf wants to fence (reboot) '<node 2>' with device '(any)'
Mar 31 23:10:53 <DC: node 1> stonith-ng[15016]: notice: Requesting peer fencing (reboot) of <node 2>
Mar 31 23:10:53 <DC: node 1> stonith-ng[15016]: notice: iLO_<DC: node 1> can not fence (reboot) <node 2>: static-list
Mar 31 23:10:53 <DC: node 1> stonith-ng[15016]: notice: iLO_<node 2> can fence (reboot) <node 2>: static-list
Mar 31 23:10:53 <DC: node 1> stonith-ng[15016]: notice: iLO_<node 3> can not fence (reboot) <node 2>: static-list
Mar 31 23:10:53 <DC: node 1> stonith-ng[15016]: notice: iLO_<node 4> can not fence (reboot) <node 2>: static-list

 

 

 

 

Changes

No known changes has been made to system prior to this outage. 

Cause

To view full details, sign in with your My Oracle Support account.

Don't have a My Oracle Support account? Click to get started!


In this Document
Symptoms
Changes
Cause
Solution
References


My Oracle Support provides customers with access to over a million knowledge articles and a vibrant support community of peers and Oracle experts.