Database node in Exadata becomes inaccessible (Doc ID 1412783.1)

Last updated on DECEMBER 11, 2013

Applies to:

Oracle Exadata Hardware - Version 11.2.0.1 and later
Linux x86-64

Symptoms

One of the Database node might suddenly become inaccessible. Power cycle the node using ILOM and the node is available again.

OS Watcher ps output shows krdsd in state 'D'

zzz ***Fri Jan 13 22:10:27 EET 2012
1 D root 4099 659 1 70 -5 - 0 - 01:10 ? 00:14:48 [krdsd]


OS Watcher ExadataRDS  Shows

===/usr/bin/rds-info:
LocalAddr      RemoteAddr      LocalDev                           RemoteDev
192.168.84.35 192.168.84.104     fe80::21:2800:13e:a057      fe80::21:2800:13f:d57
192.168.84.35 192.168.84.111 :: ::
192.168.84.35 192.168.84.110     fe80::21:2800:13e:a057     fe80::21:2800:13f:16e3

Notice the absence HW Address


The Messages file will Show

Jan 13 22:07:26 cbapdb06 kernel: RDS/IB: recv completion on 192.168.84.111 had status 5, disconnecting and reconnecting
Jan 13 22:10:03 cbapdb06 kernel: type=1101 audit(1326485403.041:9173): user pid=9916 uid=0 auid=4294967295 msg='PAM: accounting acct="root" : exe="/usr/sbin/crond" (hostname=?, addr=?, terminal=cron res=success)'
Jan 13 22:10:03 cbapdb06 kernel: type=1101 audit(1326485403.061:9174): user pid=9918 uid=0 auid=4294967295 msg='PAM: accounting acct="cpmon" : exe="/usr/sbin/crond" (hostname=?, addr=?, terminal=cron res=success)'
Jan 13 22:10:03 cbapdb06 kernel: type=1103 audit(1326485403.087:9175): user pid=9916 uid=0 auid=4294967295 msg='PAM: setcred acct="root" : exe="/usr/sbin/crond" (hostname=?, addr=?, terminal=cron res=success)'
Jan 13 22:10:03 cbapdb06 kernel: type=1006 audit(1326485403.118:9176): login pid=9916 uid=0 old auid=4294967295 new auid=0 old ses=4294967295 new ses=1271
Jan 13 22:10:03 cbapdb06 kernel: type=1103 audit(1326485403.118:9177): user pid=9918 uid=0 auid=4294967295 msg='PAM: setcred acct="cpmon" : exe="/usr/sbin/crond" (hostname=?, addr=?, terminal=cron res=success)'
Jan 13 22:10:03 cbapdb06 kernel: type=1006 audit(1326485403.118:9178): login pid=9918 uid=0 old auid=4294967295 new auid=1001 old ses=4294967295 new ses=1272
Jan 13 22:10:04 cbapdb06 kernel: type=1105 audit(1326485404.075:9179): user pid=9916 uid=0 auid=0 msg='PAM: session open acct="root" : exe="/usr/sbin/crond" (hostname=?, addr=?, terminal=cron res=success)'
Jan 13 22:10:04 cbapdb06 kernel: type=1105 audit(1326485404.109:9180): user pid=9918 uid=0 auid=1001 msg='PAM: session open acct="cpmon" : exe="/usr/sbin/crond" (hostname=?, addr=?, terminal=cron res=success)'
Jan 13 22:10:04 cbapdb06 kernel: type=1101 audit(1326485404.485:9181): user pid=9917 uid=0 auid=4294967295 msg='PAM: accounting acct="cpmon" : exe="/usr/sbin/crond" (hostname=?, addr=?, terminal=cron res=success)'
Jan 13 22:10:04 cbapdb06 kernel: type=1103 audit(1326485404.513:9182): user pid=9917 uid=0 auid=4294967295 msg='PAM: setcred acct="cpmon" : exe="/usr/sbin/crond" (hostname=?, addr=?, terminal=cron res=success)'
Jan 13 22:10:37 cbapdb06 kernel: SysRq : Resetting

Cause

Sign In with your My Oracle Support account

Don't have a My Oracle Support account? Click to get started

My Oracle Support provides customers with access to over a
Million Knowledge Articles and hundreds of Community platforms