My Oracle Support Banner

Exadata: ORA-29770 or LMS Hang When Bringing an Infiniband Port Down (Doc ID 2633414.1)

Last updated on JANUARY 29, 2020

Applies to:

Oracle Exadata Storage Server Software - Version 19.3.0.0.0 to 19.3.3.0.0 [Release 12.2]
Oracle Database - Enterprise Edition - Version 12.2.0.1 to 19.5.0.0.0 [Release 12.2 to 19]
Linux x86-64

Symptoms

The following conditions must exist for this issue to occur:

ORA-29770 or LMS hang/crash when one of the IB switches is down for planned maintenance or due to abnormal failure. The IP address on the failed port failover successfully, but the applications using the failed port experience a long brown out for the period as long as the port stays down. The brownout on the application is caused by gc* wait events.

Alert log example:

2019-12-13T11:18:46.066786+08:00
LMS3 (ospid: 82343_82350) waits for event 'gcs remote message' for 96 secs.
2019-12-13T11:18:48.402828+08:00
Errors in file /u01/app/oracle/diag/rdbms/rac/rac8/trace/rac8_lmhb_82369.trc (incident=2123161) (PDBNAME=CDB$ROOT):
ORA-29770: global enqueue process LMS3 (OSID 82343_82350) is hung for more than 70 seconds
Incident details in: /u01/app/oracle/diag/rdbms/rac/rac8/incident/incdir_2123161/rac8_lmhb_82369_i2123161.trc
LMHB (ospid: 82369): terminating the instance due to ORA error 29770
2019-12-13T11:18:51.381441+08:00
Cause - 'ERROR: Some process(s) is not making progress.
LMHB (ospid: 82369) is terminating the instance.

LMHB trace file would have messages like:

LMS3 (ospid: 82343_82350) has no heartbeats for 99 sec. (threshold 70)

service name: SYS$BACKGROUND
Current Wait Stack:
0: waiting for 'gcs remote message'
waittime=0x1e, poll=0x0, event=0xec
wait_id=667923 seq_num=12579 snap_id=1
wait times: snap=1 min 38 sec, exc=1 min 38 sec, total=1 min 38 sec
wait times: max=infinite, heur=1 min 40 sec
wait counts: calls=1 os=1
in_wait=1 iflags=0x5a0

The LMS trace files would have messages like:

IPCLW:[0.2189]{E}[WAIT]:RC: [1568996782296263]Unknown reject data for cnh 0x7f80a91965f8 from xxx.xxx.xx.203:44469. Discarding and attempting re-connect.

Cause

To view full details, sign in with your My Oracle Support account.

Don't have a My Oracle Support account? Click to get started!


In this Document
Symptoms
Cause
Solution
References


My Oracle Support provides customers with access to over a million knowledge articles and a vibrant support community of peers and Oracle experts.