X4-Exadata+Exalogic active/active, Restarting IB switch caused vips failover on Exadata DB node.

(Doc ID 2148988.1)

Last updated on JUNE 19, 2017

Applies to:

Oracle Exadata Storage Server Software - Version 12.1.2.1.3 and later
Information in this document applies to any platform.
Eighth X4-2 Exadata and a half X2-2 Exalogic connected together
exalogic switches have priority 5 and controlled handover true
exadata switches have priority 2 and controlled handover false.

Goal

 
In X4-Exadata+Exalogic active/active configuration, Restarting IB switch caused vips to failover on Exadata DB node.

In current configuration of the X4 Exadata infiniband is configured as active/active and if ib0 is disabled on node 1, VIP failovers to node2 and service resource stops.  
we do not have to failover/failiback VIP on this configuration. The current behavior causes unnecessary outage.

Customer rebooted the a IB switch  in planned maintenance which caused IB vips failoverd to another node caused application outage which suppose not to happen in the X4 configued in active/active.  

Eighth X4-2 Exadata and a half X2-2 Exalogic connected together
exalogic switches have priority 5  and controlled handover true
exadata switches have priority 2 and controlled handover false.


X4 - Exadata+Exalogic (ko) - active/active ib01 ib1                                           <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< OK
X2 - Exadata+Exalogic (ok) - bonded IB                                                          <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< OK


ora.xyz2.xyzwork on xyzabc01 goes offline while ib switch is being rebooted
============================================================================

2016-03-11 05:19:12.184327 :    AGFW:2749359872: {0:5:16} ora.xyz2.xyzwork xyzabc01 1 state changed from: ONLINE to: OFFLINE
2016-03-11 05:19:12.184341 :    AGFW:2749359872: {0:5:16} Switching online monitor to offline one
2016-03-11 05:19:12.184377 :    AGFW:2749359872: {0:5:16} Starting offline monitor
2016-03-11 05:19:12.184416 :    AGFW:2749359872: {0:5:16} Started implicit monitor for [ora.xyz2.xyzwork xyzabc01 1] interval=60000 delay=60000
2016-03-11 05:19:12.184434 :    AGFW:2749359872: {0:5:60} Generating new Tint for unplanned state change. Original Tint: {0:5:16}

Check in IB0 interface failed & it triggeed to perform failover
================================================================

2016-03-11 05:19:12.665206 :CLSDYNAM:3094865664: [xdaa-xyzabc11gvip]{0:43:6} [check] Failed to check 192.168.224.39 on ib0             <<<<<<<<<<<<<
2016-03-11 05:19:12.665221 :CLSDYNAM:3094865664: [xdaa-xyzabc11gvip]{0:43:6} [check] (null) category: 0, operation: , loc: , OS error: 0, other:
2016-03-11 05:19:12.665238 :CLSDYNAM:3094865664: [xdaa-xyzabc11gvip]{0:43:6} [check] VipAgent::checkIp returned false
2016-03-11 05:19:12.665535 :    AGFW:2749359872: {0:43:6} xdaa-xyzabc11gvip 1 1 state changed from: ONLINE to: OFFLINE                    <<<<<<<<<<<<<
2016-03-11 05:19:12.665583 :    AGFW:2749359872: {0:5:61} Generating new Tint for unplanned state change. Original Tint: {0:43:6}

2016-03-11 05:19:13.168315 :CLSDYNAM:2728347392: [xdaa-xyzabc12cvip]{0:37:15} [check] Failed to check 192.168.224.40 on ib0            <<<<<<<<<<<<<
2016-03-11 05:19:13.168341 :CLSDYNAM:2728347392: [xdaa-xyzabc12cvip]{0:37:15} [check] (null) category: 0, operation: , loc: , OS error: 0, other:
2016-03-11 05:19:13.168378 :CLSDYNAM:2728347392: [xdaa-xyzabc12cvip]{0:37:15} [check] VipAgent::checkIp returned false
2016-03-11 05:19:13.168865 :    AGFW:2749359872: {0:37:15} xdaa-xyzabc12cvip 1 1 state changed from: ONLINE to: OFFLINE                   <<<<<<<<<<<<<
2016-03-11 05:19:13.168924 :    AGFW:2749359872: {0:5:62} Generating new Tint for unplanned state change. Original Tint: {0:37:15}

2016-03-11 05:19:13.671468 :CLSDYNAM:2751461120: [xdaa-ggvip]{1:8252:38046} [check] Failed to check 192.168.224.35 on ib0              <<<<<<<<<<<<<
2016-03-11 05:19:13.671501 :CLSDYNAM:2751461120: [xdaa-ggvip]{1:8252:38046} [check] (null) category: 0, operation: , loc: , OS error: 0, other:
2016-03-11 05:19:13.671584 :CLSDYNAM:2751461120: [xdaa-ggvip]{1:8252:38046} [check] VipAgent::checkIp returned false
2016-03-11 05:19:13.672082 :    AGFW:2749359872: {1:8252:38046} xdaa-ggvip 1 1 state changed from: ONLINE to: OFFLINE                     <<<<<<<<<<<<<
2016-03-11 05:19:13.672142 :    AGFW:2749359872: {0:5:63} Generating new Tint for unplanned state change. Original Tint: {1:8252:38046}

2016-03-11 05:19:14.173594 :CLSDYNAM:2722043648: [ora.xyzabc01_2.vip]{1:8252:31446} [check] Failed to check 192.168.224.31 on ib0   <<<<<<<<<<<<<
2016-03-11 05:19:14.173609 :CLSDYNAM:2722043648: [ora.xyzabc01_2.vip]{1:8252:31446} [check] (null) category: 0, operation: , loc: , OS error: 0, other:
2016-03-11 05:19:14.173627 :CLSDYNAM:2722043648: [ora.xyzabc01_2.vip]{1:8252:31446} [check] VipAgent::checkIp returned false
2016-03-11 05:19:14.173927 :    AGFW:2749359872: {1:8252:31446} ora.xyzabc01_2.vip 1 1 state changed from: ONLINE to: OFFLINE          <<<<<<<<<<<<<
2016-03-11 05:19:14.173976 :    AGFW:2749359872: {0:5:64} Generating new Tint for unplanned state change. Original Tint: {1:8252:31446}


By the time next xyzwork check action performed, switch got rebooted and xyzwork is online. ora.xyz2.xyzwork om xyzabc01 was offline & starting
==========================================================================================================================================

2016-03-11 05:19:15.300306 :    AGFW:2749359872: {0:5:60} Agent received the message: RESOURCE_START[ora.xyz2.xyzwork xyzabc01 1] ID 4098:1705900
2016-03-11 05:19:15.300323 :    AGFW:2749359872: {0:5:60} Preparing START command for: ora.xyz2.xyzwork xyzabc01 1
2016-03-11 05:19:15.300332 :    AGFW:2749359872: {0:5:60} ora.xyz2.xyzwork xyzabc01 1 state changed from: OFFLINE to: STARTING

2016-03-11 05:19:15.326595 :    AGFW:2730448640: {0:5:60} Command: start for resource: ora.xyz2.xyzwork xyzabc01 1 completed with status: SUCCESS

2016-03-11 05:19:15.327633 :    AGFW:2749359872: {0:5:60} ora.xyz2.xyzwork xyzabc01 1 state changed from: STARTING to: ONLINE

2016-03-11 05:19:15.351897 : USRTHRD:2743056128: {0:5:60} Thread:[VipRelocate:] isRunning is reset to false here

=================================================================================


Solution

Sign In with your My Oracle Support account

Don't have a My Oracle Support account? Click to get started

My Oracle Support provides customers with access to over a
Million Knowledge Articles and hundreds of Community platforms