HAIP Can't Failover To Available NIC When The Cable To Current NIC Is Broken

(Doc ID 2383905.1)

Last updated on MAY 07, 2018

Applies to:

Oracle Database - Enterprise Edition - Version 12.2.0.1 to 12.2.0.1 [Release 12.2]
Information in this document applies to any platform.

Symptoms

This issue has been found during test.  In a cluster with 2 private NICs (p6p1 & p7p1), the network cable to one of the NIC has been unplugged to simulate the network cable failure.  It's observed the HAIP on unplugged NIC couldn't failover to another available NIC. Following is the symptoms:

1. We can find the NIC link down message in OS log

Feb 13 15:53:26 TEST1 kernel: ixgbe 0000:81:00.0: p6p1: NIC Link is Down <<<<<<
Unplug the cable

2. Can not find any abnormal error/warning in ohasd_orarootagent_root.trc during the issue.  As a compare during another test to shutdown the NIC (ifconfig p6p1 down), we can find the HAIP failover message in the same ohasd_orarootagent_root.trc:

2018-02-13 15:46:08.017 : USRTHRD:2613040896: HAIP: event GIPCD_IF_UPDATE <<<<<< Get event: GIPCD_IF_UPDATE after issue the command "ifconfig p6p1 down"
2018-02-13 15:46:08.018 : USRTHRD:2615142144: {0:0:4601} dequeue change event 0x7f67940639f0, GIPCD_IF_UPDATE
2018-02-13 15:46:08.018 : USRTHRD:2615142144: {0:0:4601} HAIP: IF state gipcdadapterstateDown 
2018-02-13 15:46:08.018 : USRTHRD:2615142144: {0:0:4601} HAIP: remove inf 1,  0x7f678c06ce60

......

......

2018-02-13 15:46:08.943 : USRTHRD:2615142144: {0:0:4601} HAIP: Moving ip '169.254.124.181' from inf 'p6p1' to inf 'p7p1'  <<<<<< Failed over the HAIP (address) to another available NIC(p7p1)
2018-02-13 15:46:08.943 : USRTHRD:2615142144: {0:0:4601} pausing thread
2018-02-13 15:46:08.943 : USRTHRD:2615142144: {0:0:4601} posting thread
2018-02-13 15:46:08.943 : USRTHRD:2615142144: {0:0:4601} Thread::start { acquire thndMX:8c1e0ba0
2018-02-13 15:46:08.943 : USRTHRD:2615142144: {0:0:4601} Thread::start spawn pThnd:0x7f678c39f010 thndType:1
2018-02-13 15:46:08.943 : USRTHRD:2615142144: {0:0:4601} Thread::start thread spawned tid:2579191552
2018-02-13 15:46:08.943 : USRTHRD:2615142144: {0:0:4601} Thread::start spawned release thndMX:8c1e0ba0 }
2018-02-13 15:46:08.943 : USRTHRD:2615142144: {0:0:4601} to verify routes
2018-02-13 15:46:08.943 : USRTHRD:2615142144: {0:0:4601} to verify start completion 2
2018-02-13 15:46:08.943 : USRTHRD:2615142144: {0:0:4601} HAIP: check ip '169.254.124.181'
2018-02-13 15:46:08.943 : USRTHRD:2615142144: {0:0:4601} HAIP: CleanDeadThreads entry 

3. After unplug the cable, we can find the "UP" flag for the NIC, but no "RUNNING":

p6p1 Link encap:Ethernet HWaddr XX:XX:XX:XX:XX:XX
inet addr:XX.XX.XX.XX Bcast:20.20.20.255 Mask:255.255.255.0
inet6 addr: XX::XX:XX:XX:6188/64 Scope:Link
UP BROADCAST MULTICAST MTU:1500 Metric:1 <<<<<< No RUNNING flag 

Changes

 Simulating Private NIC failure.

Cause

Sign In with your My Oracle Support account

Don't have a My Oracle Support account? Click to get started

My Oracle Support provides customers with access to over a
Million Knowledge Articles and hundreds of Community platforms