Exadata: Cellsrv Service Not Restarting After Machine Reboot

(Doc ID 1917968.1)

Last updated on AUGUST 25, 2014

Applies to:

Oracle Exadata Storage Server Software - Version 11.2.1.3.0 to 12.1.1.1.1 [Release 11.2 to 12.1]
Information in this document applies to any platform.

Symptoms

The service cellsrv fails to start after hardware maintenance on cell node, also symptomatic are:

 

---> Cell alert.log will reflect similar content to:

[RS] Process /opt/oracle/cell11.2.3.2.1_LINUX.X64_130109/cellsrv/bin/cellrsomt (pid: 10196) received exception [signal num: 14] [ADDR:0x0]
Sun Aug 17 12:52:06 2014
Sun Aug 17 12:52:06 2014State dump completed for Cellsrv<10199>
Sun Aug 17 12:52:06 2014
State dump signal delivered to Cellsrv<10199> by RS.
Sun Aug 17 12:52:11 2014
State dump interrupted for Cellsrv<10199> by RS.  It did not complete in 5 seconds.
Clean shutdown signal delivered to OSS<10199>
[RS] monitoring process /opt/oracle/cell11.2.3.2.1_LINUX.X64_130109/cellsrv/bin/cellrsomt (pid: 0) returned with error: 124

 

 

---> ms-odl.trc file will reflect similar content to:

[2014-08-17T12:57:54.082-07:00] [ossmgmt] [NOTIFICATION] [] [ms.hwadapter.osadp.MSLnx1OSAdapterImpl] [tid: 13] [ecid: 10.20.100.154:63599:1408305140549:3,0] Error occurred during IBPort population.[[
oracle.ossmgmt.ms.core.MSCell$ExecSageException: CELL-02623: The command "/usr/sbin/ibhosts" returned an error code 255.
    at oracle.ossmgmt.ms.core.MSCell.returnCmd(MSCell.java:2575)
    at oracle.ossmgmt.ms.core.MSCell.returnCmd(MSCell.java:2514)
    at oracle.ossmgmt.ms.core.MSCell.returnCmd(MSCell.java:2486)
    at oracle.ossmgmt.ms.hwadapter.osadp.MSLnx1OSAdapterImpl.populateIBPorts(MSLnx1OSAdapterImpl.java:821)
    at oracle.ossmgmt.ms.hwadapter.osadp.MSOSStatsLinux.getNetStats(MSOSStatsLinux.java:1063)
    at oracle.ossmgmt.ms.core.MSIDBPlanMetricDef.collectNetNicStats(MSIDBPlanMetricDef.java:2150)
    at oracle.ossmgmt.ms.core.MSIDBPlanMetricDef.collect(MSIDBPlanMetricDef.java:1881)
    at oracle.ossmgmt.ms.core.MSIDBPlanMetricTimerTask.run(MSIDBPlanMetricTimerTask.java:89)
    at java.util.TimerThread.mainLoop(Timer.java:512)
    at java.util.TimerThread.run(Timer.java:462)

 

--->  ifconfig output will NOT reflect the ib0 (or ib1) interface, will show bondib0, but not ib0 (or ib1) similar to:

bondib0   Link encap:Ethernet  HWaddr 00:00:00:00:00:00
         inet addr:192.168.10.5  Bcast:192.168.11.255  Mask:255.255.254.0
         UP BROADCAST RUNNING MASTER MULTICAST  MTU:1500  Metric:1
         RX packets:0 errors:0 dropped:0 overruns:0 frame:0
         TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
         collisions:0 txqueuelen:0
         RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)

eth0      Link encap:Ethernet  HWaddr 00:10:E0:0D:8E:DA
         inet addr:10.20.100.154  Bcast:10.20.101.255  Mask:255.255.254.0
         inet6 addr: fe80::210:e0ff:fe0d:8eda/64 Scope:Link
         UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
         RX packets:6244 errors:0 dropped:0 overruns:0 frame:0
         TX packets:8425 errors:0 dropped:0 overruns:0 carrier:0
         collisions:0 txqueuelen:1000
         RX bytes:783377 (765.0 KiB)  TX bytes:2721079 (2.5 MiB)
         Memory:ddc60000-ddc80000

lo        Link encap:Local Loopback
         inet addr:127.0.0.1  Mask:255.0.0.0
         inet6 addr: ::1/128 Scope:Host
         UP LOOPBACK RUNNING  MTU:16436  Metric:1
         RX packets:13101 errors:0 dropped:0 overruns:0 frame:0
         TX packets:13101 errors:0 dropped:0 overruns:0 carrier:0
         collisions:0 txqueuelen:0
         RX bytes:1681999 (1.6 MiB)  TX bytes:1681999 (1.6 MiB)

 

 

 



Changes

Hardware maintenance on system (such as a battery replacement) caused a possible seating issue with the InfiniBand hardware.

Cause

Sign In with your My Oracle Support account

Don't have a My Oracle Support account? Click to get started

My Oracle Support provides customers with access to over a
Million Knowledge Articles and hundreds of Community platforms