My Oracle Support Banner

ora.net2.network Resource Startup fails with error "sIsIfRunning ib0 is not RUNNING : 1003" (Doc ID 2941105.1)

Last updated on JUNE 28, 2023

Applies to:

Oracle Database - Enterprise Edition - Version 12.2.0.1 to 19.18.0.0.0 [Release 12.2 to 19]
Information in this document applies to any platform.

Symptoms

 

After EXADATA QFSDP patching, ora,net2.network resource is not starting and it fails with below port down errors.

 

ora.net2.network
ONLINE OFFLINE <Node1> STABLE

CRS-2672: Attempting to start 'ora.net2.network' on '<Node1>'
CRS-5017: The resource action "ora.net2.network start" encountered the following error:
CRS-5008: Invalid attribute value: ib0 for the network interface
. For details refer to "(:CLSN00107:)" in "/u01/app/grid/diag/crs/<Node1>/crs/trace/crsd_orarootagent_root.trc".
CRS-2674: Start of 'ora.net2.network' on '<Node1>' failed
CRS-2679: Attempting to clean 'ora.net2.network' on '<Node1>'
CRS-2681: Clean of 'ora.net2.network' on '<Node1>' succeeded
CRS-4000: Command Start failed, or completed with errors.

//From crsd_orarootagent_root.trc,ib0 is down and its not running state as per the below symptoms from the traces.

2023-04-04 17:48:26.297 :CLSDYNAM:2764502784: [ora.asmnet2.asmnetwork]{1:9212:2} [check] }changeAttrValue
2023-04-04 17:48:26.297 :CLSDYNAM:2768705280: [ora.asmnet1.asmnetwork]{1:9212:2} [check] changeAttrValue{
2023-04-04 17:48:26.297 :CLSDYNAM:2768705280: [ora.asmnet1.asmnetwork]{1:9212:2} [check] changeAttrValue New Value 1 and Current Value 1 are the same
2023-04-04 17:48:26.297 :CLSDYNAM:2768705280: [ora.asmnet1.asmnetwork]{1:9212:2} [check] }changeAttrValue
2023-04-04 17:48:34.280 : USRTHRD:2760300288: [ INFO] {1:9212:2} Relocating Resource ora.<Node1>.vip
2023-04-04 17:48:34.281 : USRTHRD:2760300288: [ INFO] {1:9212:2} Agent::getNodeName getCSSAttribute
2023-04-04 17:48:34.300 : USRTHRD:2760300288: [ INFO] {1:9212:2} Thread:[VipRelocate:] isRunning is reset to false here
2023-04-04 17:48:34.300 : USRTHRD:2760300288: [ INFO] {1:9212:2} Thread:[VipRelocate:] isFinished set to true
2023-04-04 17:48:39.195 : AGFW:2783414016: [ INFO] {1:9212:2} Agent received the message: RESOURCE_START[ora.net2.network <Node1> 1] ID 4098:1787
2023-04-04 17:48:39.195 : AGFW:2783414016: [ INFO] {1:9212:2} Preparing START command for: ora.net2.network <Node1> 1
2023-04-04 17:48:39.195 : AGFW:2783414016: [ INFO] {1:9212:2} ora.net2.network <Node1> 1 state changed from: OFFLINE to: STARTING
2023-04-04 17:48:39.196 :CLSDYNAM:2766604032: [ora.net2.network]{1:9212:2} [start] (:CLSN00107:) clsn_agent::start {
2023-04-04 17:48:39.196 : USRTHRD:2766604032: [ INFO] {1:9212:2} Agent::refreshAttr m_usrOraEnv:
2023-04-04 17:48:39.196 :CLSDYNAM:2766604032: [ora.net2.network]{1:9212:2} [start] NetworkAgent::init enter {
2023-04-04 17:48:39.196 :CLSDYNAM:2766604032: [ora.net2.network]{1:9212:2} [start] Checking if ib0 Interface is fine, flag=0
2023-04-04 17:48:39.197 :CLSDYNAM:2766604032: [ora.net2.network]{1:9212:2} [start] sIsIfRunning ib0 is not RUNNING : 1003>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
2023-04-04 17:48:39.197 :CLSDYNAM:2766604032: [ora.net2.network]{1:9212:2} [start] Agent::commonStart Exception UserErrorException
2023-04-04 17:48:39.197 :CLSDYNAM:2766604032: [ora.net2.network]{1:9212:2} [start] clsnUtils::error Exception type=2 string=
CRS-5017: The resource action "ora.net2.network start" encountered the following error:
CRS-5008: Invalid attribute value: ib0 for the network interface
. For details refer to "(:CLSN00107:)" in "/u01/app/grid/diag/crs/<Node1>/crs/trace/crsd_orarootagent_root.trc".

 

//

ib0: flags=4099<UP,BROADCAST,MULTICAST> mtu 7000>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>..for ib0 ,there is no any subnet/netmask as similar to ib1
infiniband 80:00:02:08:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00 txqueuelen 256 (InfiniBand)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 0 bytes 0 (0.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

ib1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 7000
inet 192.<XXX>.<YY>.2 netmask <XXX.XXX.YYY.0> broadcast 192.<XX>.<XX>.255
inet6 fe80::210:e000:144:33fa prefixlen 64 scopeid 0x20<link>
infiniband 80:00:02:09:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00 txqueuelen 256 (InfiniBand)
RX packets 5507310 bytes 1645407439 (1.5 GiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 6372738 bytes 4122362882 (3.8 GiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

ib1:P01: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 7000
inet 192.192.<XXX>.<YY>.1 netmask <XXX.XXX.YYY.0> broadcast 192.<XX>.<XX>.255
infiniband 80:00:02:09:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00 txqueuelen 256 (InfiniBand)

//From IBSTAT

#HEADER:Output of /sbin/ibstat
CA 'mlx4_0'
CA type: MT4099
Number of ports: 2
Firmware version: 2.42.5514
Hardware version: 1
Node GUID: 0x0010e000014433f8
System image GUID: 0x0010e000014433fb
Port 1:
State: Down>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>Down state
Physical state: Polling>>>>>>>>>>>>>>>>>>>>>
Rate: 10
Base lid: 0>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
LMC: 0
SM lid: 0>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Capability mask: 0x02514868
Port GUID: 0x0010e000014433f9
Link layer: InfiniBand
Port 2:
State: Active
Physical state: LinkUp
Rate: 40
Base lid: 53
LMC: 0
SM lid: 77
Capability mask: 0x02514868
Port GUID: 0x0010e000014433fa
Link layer: InfiniBand

#HEADER:Output of /sbin/ibstatus
Infiniband device 'mlx4_0' port 1 status:
default gid: fe80:0000:0000:0000:0010:e000:0144:33f9
base lid: 0x0
sm lid: 0x0
state: 1: DOWN>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>DOWN state
phys state: 2: Polling
rate: 10 Gb/sec (4X)
link_layer: InfiniBand

Infiniband device 'mlx4_0' port 2 status:
default gid: fe80:0000:0000:0000:0010:e000:0144:33fa
base lid: 0x35
sm lid: 0x4d
state: 4: ACTIVE
phys state: 5: LinkUp

//From OS message logs of Node <Node1>

Apr 4 17:23:15 <Node1> systemd[1]: Removed slice User Slice of oragrid.
Apr 4 17:23:33 <Node1> kernel: [1511829.834157] rdmaip: NET-EVENT: NETDEV-CHANGE, PORT mlx4_0/port_1/ib0 : port state transition NONE - port retained in state DOWN (portlayers 0xc)>>>>>>>>>
Apr 4 17:23:33 <Node1> systemd[1]: Starting Network Service...
Apr 4 17:23:33 <Node1> systemd-networkd[114565]: Enumeration completed
Apr 4 17:23:33 <Node1> systemd[1]: Started Network Service.
Apr 4 17:23:33 <Node1> kernel: [1511829.982991] rdmaip: NET-EVENT: NETDEV-DOWN, PORT mlx4_0/port_1/ib0 : port state transition NONE - port retained in state DOWN (portlayers 0x8)>>>>>>>>>>>>>>
Apr 4 17:24:02 <Node1> su: (to oragrid) root on none
Apr 4 17:24:02 <Node1> systemd[1]: Created slice User Slice of oragrid.
Apr 4 17:24:02 <Node1> systemd[1]: Started Session c35378 of user oragrid.
Apr 4 17:24:02 <Node1> ifup-pre-local: Nothing to do for interface 'ib0'.
Apr 4 17:24:02 <Node1> kernel: [1511859.021607] IPv6: ADDRCONF(NETDEV_UP): ib0: link is not ready
Apr 4 17:24:02 <Node1> kernel: [1511859.021628] rdmaip: NET-EVENT: NETDEV-UP, PORT mlx4_0/port_1/ib0 : port state transition NONE - port retained in state DOWN (portlayers 0xc)
Apr 4 17:45:53 <Node1> kernel: [ 94.834839] rdmaip: Triggering initial failovers(itercount 351)
Apr 4 17:45:53 <Node1> kernel: [ 94.834864] rdmaip_do_initial_failover: port index 1 interface ib0 transitioned from INIT to DOWN state (portlayers 0xc)
Apr 4 17:45:53 <Node1> kernel: [ 94.834867] rdmaip_do_initial_failover: port index 2 interface ib1 transitioned from INIT to UP state (portlayers 0xf)
Apr 4 17:45:53 <Node1> kernel: [ 94.835138] rdmaip: IPv4 192.168.10.1 migrated from ib0 (port 1) to ib1:P01 (port 2)
Apr 4 17:45:53 <Node1> kernel: [ 94.835143] rdmaip: mlx4_0/port_1/ib0: IPv4 192.<XXX>.<YY>.1/192.<XXX>.<YY>.255/<XXX.XXX.YYY.0> Link Status: DOWN>>>>>>>>>>>>>>>>>>>>>.link down for ib0
Apr 4 17:45:53 <Node1> kernel: [ 94.835144] rdmaip: mlx4_0/port_2/ib1: IPv4 192.<XXX>.<YY>.2/192.<XXX>.<YY>.255/<XXX.XXX.YYY.0> Link Status: UP

//From switch output status.
//From [root@<Switch>-ibb0
# listlinkup

Connector 0A Not present

Connector 1A Not present

Connector 2A Not present

Connector 3A Not present

Connector 4A Not present

Connector 5A Not present

Connector 6A Not present

Connector 7A Not present

Connector 8A Present <-> Switch Port 31 is up (Enabled)

Connector 9A Present <-> Switch Port 14 is up (Enabled)

Connector 10A Present <-> Switch Port 16 is up (Enabled)

Connector 11A Present <-> Switch Port 18 is up (Enabled)

Connector 12A Not present

Connector 13A Present <-> Switch Port 09 is up (Enabled)

Connector 14A Present <-> Switch Port 07 is up (Enabled)

Connector 15A Present <-> Switch Port 05 is up (Enabled)

Connector 16A Present <-> Switch Port 03 is up (Enabled)

Connector 17A Present <-> Switch Port 01 is up (Enabled)

Connector 0B Not present

Connector 1B Not present

Connector 2B Not present

Connector 3B Not present

Connector 4B Not present

Connector 5B Not present

Connector 6B Not present

Connector 7B Not present

Connector 8B Present <-> Switch Port 32 is up (Enabled)

Connector 9B Present <-> Switch Port 13 is up (Enabled)

Connector 10B Present <-> Switch Port 15 is up (Enabled)

Connector 11B Present <-> Switch Port 17 is up (Enabled)

Connector 12B Present <-> Switch Port 12 is up (Enabled)

Connector 13B Present <-> Switch Port 10 is up (Enabled)

Connector 14B Not present

Connector 15B Present <-> Switch Port 06 is down (Enabled)>>>>>>>>>>>>>Down

Connector 16B Present <-> Switch Port 04 is up (Enabled)

Connector 17B Present <-> Switch Port 02 is up (Enabled)

[root@ftdexad01sw-iba01 ~]#

//[root@<Switch>-ibb0 ~]# ssh <switch>-ibb01

FW upgrade completed successfully on Sat Mar 19 23:50:26 CDT 2022.

Please run the "fwverify" CLI command to verify the new image.

This message will be cleared on next reboot.

You are now logged in to the root shell.

It is recommended to use ILOM shell instead of root shell.

All usage should be restricted to documented commands and documented

config files.

To view the list of documented commands, use "help" at linux prompt.

[root@ftdexad01sw-ibb01 ~]#

[root@ftdexad01sw-ibb01 ~]# listlinkup

Connector 0A Not present

Connector 1A Not present

Connector 2A Not present

Connector 3A Not present

Connector 4A Not present

Connector 5A Not present

Connector 6A Not present

Connector 7A Not present

Connector 8A Present <-> Switch Port 31 is up (Enabled)

Connector 9A Present <-> Switch Port 14 is up (Enabled)

Connector 10A Present <-> Switch Port 16 is up (Enabled)

Connector 11A Present <-> Switch Port 18 is up (Enabled)

Connector 12A Not present

Connector 13A Present <-> Switch Port 09 is up (Enabled)

Connector 14A Present <-> Switch Port 07 is up (Enabled)

Connector 15A Present <-> Switch Port 05 is up (Enabled)

Connector 16A Present <-> Switch Port 03 is up (Enabled)

Connector 17A Present <-> Switch Port 01 is up (Enabled)

Connector 0B Not present

Connector 1B Not present

Connector 2B Not present

Connector 3B Not present

Connector 4B Not present

Connector 5B Not present

Connector 6B Not present

Connector 7B Not present

Connector 8B Present <-> Switch Port 32 is up (Enabled)

Connector 9B Present <-> Switch Port 13 is up (Enabled)

Connector 10B Present <-> Switch Port 15 is up (Enabled)

Connector 11B Present <-> Switch Port 17 is up (Enabled)

Connector 12B Present <-> Switch Port 12 is down (Enabled)>>>>>>>>>>>>>>>>Down

Connector 13B Present <-> Switch Port 10 is up (Enabled)

 

 



Changes

 Quarterly Patch applied

Cause

To view full details, sign in with your My Oracle Support account.

Don't have a My Oracle Support account? Click to get started!


In this Document
Symptoms
Changes
Cause
Solution


My Oracle Support provides customers with access to over a million knowledge articles and a vibrant support community of peers and Oracle experts.