Exadata: Cellsrv Service Not Restarting After Machine Reboot
(Doc ID 1917968.1)
Last updated on SEPTEMBER 24, 2021
Applies to:
Oracle Exadata Storage Server Software - Version 11.2.1.3.0 to 12.1.1.1.1 [Release 11.2 to 12.1]Information in this document applies to any platform.
Symptoms
The service cellsrv fails to start after hardware maintenance on cell node, also symptomatic are:
---> Cell alert.log will reflect similar content to:
[RS] Process /opt/oracle/cell11.2.3.2.1_LINUX.X64_130109/cellsrv/bin/cellrsomt (pid: 10196) received exception [signal num: 14] [ADDR:0x0]
Sun Aug 17 12:52:06 2014
Sun Aug 17 12:52:06 2014State dump completed for Cellsrv<10199>
Sun Aug 17 12:52:06 2014
State dump signal delivered to Cellsrv<10199> by RS.
Sun Aug 17 12:52:11 2014
State dump interrupted for Cellsrv<10199> by RS. It did not complete in 5 seconds.
Clean shutdown signal delivered to OSS<10199>
[RS] monitoring process /opt/oracle/cell11.2.3.2.1_LINUX.X64_130109/cellsrv/bin/cellrsomt (pid: 0) returned with error: 124
---> ms-odl.trc file will reflect similar content to:
[2014-08-17T12:57:54.082-07:00] [ossmgmt] [NOTIFICATION] [] [ms.hwadapter.osadp.MSLnx1OSAdapterImpl] [tid: 13] [ecid: nnn.nnn.nnn.nnn:63599:1408305140549:3,0] Error occurred during IBPort population.[[
oracle.ossmgmt.ms.core.MSCell$ExecSageException: CELL-02623: The command "/usr/sbin/ibhosts" returned an error code 255.
at oracle.ossmgmt.ms.core.MSCell.returnCmd(MSCell.java:2575)
at oracle.ossmgmt.ms.core.MSCell.returnCmd(MSCell.java:2514)
at oracle.ossmgmt.ms.core.MSCell.returnCmd(MSCell.java:2486)
at oracle.ossmgmt.ms.hwadapter.osadp.MSLnx1OSAdapterImpl.populateIBPorts(MSLnx1OSAdapterImpl.java:821)
at oracle.ossmgmt.ms.hwadapter.osadp.MSOSStatsLinux.getNetStats(MSOSStatsLinux.java:1063)
at oracle.ossmgmt.ms.core.MSIDBPlanMetricDef.collectNetNicStats(MSIDBPlanMetricDef.java:2150)
at oracle.ossmgmt.ms.core.MSIDBPlanMetricDef.collect(MSIDBPlanMetricDef.java:1881)
at oracle.ossmgmt.ms.core.MSIDBPlanMetricTimerTask.run(MSIDBPlanMetricTimerTask.java:89)
at java.util.TimerThread.mainLoop(Timer.java:512)
at java.util.TimerThread.run(Timer.java:462)
---> ifconfig output will NOT reflect the ib0 (or ib1) interface, will show bondib0, but not ib0 (or ib1) similar to:
bondib0 Link encap:Ethernet HWaddr 00:00:00:00:00:00
inet addr:nnn..nnn.nnn.nnn Bcast:nnn.nnn.nnn.254 Mask:255.255.254.0
UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:0 (0.0 b) TX bytes:0 (0.0 b)
eth0 Link encap:Ethernet HWaddr 00:10:E0:0D:8E:DA
inet addr:nnn.nnn.nnn.nnn Bcast:nnn.nnn.nnn.nnn Mask:255.255.254.0
inet6 addr: nnnn.nnnn.nnnn.nnnn:nnnn.nnnn/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:6244 errors:0 dropped:0 overruns:0 frame:0
TX packets:8425 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:783377 (765.0 KiB) TX bytes:2721079 (2.5 MiB)
Memory:ddc60000-ddc80000
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:13101 errors:0 dropped:0 overruns:0 frame:0
TX packets:13101 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:1681999 (1.6 MiB) TX bytes:1681999 (1.6 MiB)
Changes
Hardware maintenance on system (such as a battery replacement) caused a possible seating issue with the InfiniBand hardware.
Cause
To view full details, sign in with your My Oracle Support account. |
|
Don't have a My Oracle Support account? Click to get started! |