Server with Unbreakable Enterprise Kernel Becomes Unstable after Being 'rds-ping'ed
(Doc ID 1292928.1)
Last updated on FEBRUARY 02, 2019
Applies to:
Linux OS - Version Oracle Linux 5.5 with Unbreakable Enterprise Kernel [2.6.32] to Oracle Linux 5.5 with Unbreakable Enterprise Kernel [2.6.32] [Release OL5U5]Oracle Cloud Infrastructure - Version N/A and later
Linux x86-64
Symptoms
In an Infiniband (IB) network environment, Reliable Datagram Sockets (RDS) is enabled and TCP layer is enabled by following <Document 1271342.1>. The Unbreakable Enterprise Kernel over Oracle Linux 5.5 is being used.
Performing tests with rds-stress are successful without any side effect. When one server is tested for response using rds-ping, the pings are successful but pinged host runs into trouble. One symptom is that the target host dumps the following on the serial console:
rds_connect_complete: Cannot transition to state UP, current state is 4
RDS: rds_conn_shutdown: failed to transition to state DOWN, current state is 4
rds_connect_complete: Cannot transition to state UP, current state is 4
RDS: rds_conn_shutdown: failed to transition to state DOWN, current state is 4
After running rds-ping for some time (that will depend on the system configuration / capacity etc.):
- System load (loadavg) increases gradually
- After some time processes cannot be killed
- Logging off seems to hang
- New processes cannot be started
- Logons fail
- Running shutdown / reboot hangs
- Sysrq-c or other Sysrq attempts are inconclusive
- Hard reboot seems to be the only solution
Changes
Upgraded to Unbreakable Enterprise Kernel (2.6.32 series), from Red Hat Compatible Kernel (2.6.18 series)
Cause
To view full details, sign in with your My Oracle Support account. |
|
Don't have a My Oracle Support account? Click to get started! |
In this Document
Symptoms |
Changes |
Cause |
Solution |
References |