RHEL 6.6: IPC Send timeout/node eviction etc with high packet reassembles failure (Doc ID 2008933.1)

Last updated on AUGUST 23, 2017

Applies to:

Oracle Database - Enterprise Edition - Version 11.2.0.1 and later
Generic Linux

Symptoms

Red Hat Enterprise Linux or Oracle Linux running Red-Hat compatible kernel, after upgraded to 6.6, database/node fails with messages:

Fri May 01 03:05:48 2015
IPC Send timeout detected. Receiver ospid 28660 [oracle@xxxxx (LMS0)]
Fri May 01 03:05:48 2015
Errors in file /xddv1covd/oracle/diag/rdbms/xrcovd/XRCOVD3/trace/XRCOVD3_lms0_28660.trc:
IPC Send timeout detected. Receiver ospid 28670 [oracle@xxxxx (LMS1)]
Fri May 01 03:05:53 2015
Errors in file /xddv1covd/oracle/diag/rdbms/xrcovd/XRCOVD3/trace/XRCOVD3_lms1_28670.trc:
Fri May 01 03:06:00 2015
IPC Send timeout detected. Receiver ospid 31414 [oracle@xxxxx (PZ98)]
Fri May 01 03:06:00 2015
Errors in file /xddv1covd/oracle/diag/rdbms/xrcovd/XRCOVD3/trace/XRCOVD3_pz98_31414.trc:
Fri May 01 03:06:13 2015
IPC Send timeout detected. Receiver ospid 1835 [oracle@xxxxx (PZ97)]
Fri May 01 03:06:13 2015
Errors in file /xddv1covd/oracle/diag/rdbms/xrcovd/XRCOVD3/trace/XRCOVD3_pz97_1835.trc:
Fri May 01 03:06:43 2015
Fri May 01 03:06:43 2015
Received an instance abort message from instance 1Received an instance abort message from instance 1

Please check instance 1 alert and LMON trace files for detail.Please check instance 1 alert and LMON trace files for detail.

LMS0 (ospid: 28660): terminating the instance due to error 481

Fri May 01 03:06:43 2015

System state dump requested by (instance=3, osid=28660 (LMS0)), summary=[abnormal instance termination].
System State dumped to trace file /xddv1covd/oracle/diag/rdbms/xrcovd/XRCOVD3/trace/XRCOVD3_diag_28625.trc

 

While this is happening, "netstat" shows huge jump of "packet reassembles failed": 

==>> before the issue, the following number is more or less stable or increasing slowly
6817 packet reassembles failed
....
==>> in 30 minutes it increased by 50
6867 packet reassembles failed
==>> now the issue is happening and in 10 seconds it increased by 7533 - 6867 = 666
7533 packet reassembles failed
==>> in another 10 seconds it increased by 9630 - 7533 = 2097
9630 packet reassembles failed

 

Other symptoms could be:

1. node eviction

2. instance/node won't join the cluster after instance/node eviction without rebooting the node where  "packet reassembles failed" is happening

 

Cause

Sign In with your My Oracle Support account

Don't have a My Oracle Support account? Click to get started

My Oracle Support provides customers with access to over a
Million Knowledge Articles and hundreds of Community platforms