Read-only file system detection thread was hung. Cell was power cycled to stop the hang

(Doc ID 2296267.1)

Last updated on APRIL 08, 2018

Applies to:

Oracle Exadata Storage Server Software - Version 12.1.2.3.0 and later
Information in this document applies to any platform.

Symptoms

1.Cell server reboots

2.Cell alerthistory shows the message

alerthistory.out
----------------
1 2017-08-09T13:22:54+05:30 critical "Read-only file system detection thread was hung. Cell was power cycled to stop the hang."'

3. OS messages doesn't show any error

/var/log/messages
-----------------
Aug 9 12:48:54 cel01 nscd: 28973 monitoring directory '/etc' (2)
Aug 9 13:26:56 cel01 kernel: imklog 5.8.10, log source = /proc/kmsg started.
Aug 9 12:48:54 cel01 nscd: 28973 monitoring directory '/etc' (2)
Aug 9 13:26:56 cel01 kernel: imklog 5.8.10, log source = /proc/kmsg started.
Aug 9 13:26:56 cel01 kernel: Linux version 2.6.39-400.286.3.el6uek.x86_64 (mockbuild@x86-ol6-builder-04) (gcc version 4.4.6 20110731 (Red Hat 4.4.6-3) (GCC) ) #1 SMP Fri Oct 21 15:33:03 PDT 2016

4.IOstat from ExaWatcher doesn't show high disk utilization

IOstat
--------
08/09/17 13:21:31
avg-cpu: %user %nice %system %iowait %steal %idle
0.29 0.01 0.13 0.00 0.00 99.57

Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await r_await w_await svctm %util
nvme0n1 0.00 0.00 0.00 2.40 0.00 25.60 10.67 0.00 0.00 0.00 0.00 0.00 0.00
nvme1n1 0.00 0.00 11.20 2.60 985.60 25.60 73.28 0.00 0.13 0.16 0.00 0.10 0.14
nvme2n1 0.00 0.00 0.00 2.80 0.00 24.00 8.57 0.00 0.00 0.00 0.00 0.00 0.00
nvme3n1 0.00 0.00 6.40 3.00 819.20 27.20 90.04 0.00 0.17 0.25 0.00 0.11 0.10
sdc 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdd 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdf 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdh 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
...

5. FwTermLog shows RAID HBA resetting with firmware crash few minutes before the reboot 

megacli64-FwTermLog.out
------------------------
08/09/17 13:17:16: C1:In MonTask; Seconds from powerup = 0x00e3f0d4
08/09/17 13:17:16: C1:Max Temperature = 82 on Channel 3
Firmware crash dump feature enabled
Crash dump collection will start immediately
copied 513 MB in 9898217 Microseconds
[0]: fp=c03efd20, lr=c13244cc - _MonTask+200
[1]: fp=c03efe48, lr=c1382fc8 - enterMonTask+a0
[2]: fp=c03efe58, lr=c03e1890 - exceptionMachineCheck+114
[3]: fp=c03eff50, lr=c03e0780 - _CommonMachineCheckExceptionHandler+60
[4]: fp=c02caf38, lr=c03490c4 - set_state+5d8
[5]: fp=c02caf98, lr=c0349584 - raid_task_idle_loop+24
[6]: fp=c02cafa8, lr=c1380510 - _mainCore1+140
[7]: fp=c02cafd0, lr=fc8084fc - __startCore1+228
[8]: fp=c02caff8, lr=fc801130 - __start+d8
MonTask: line 243 in file ../../raid/1078int.c

 

You can run command "/opt/oracle.SupportTools/sundiag.sh osw YYYY/MM/DD_HH:MM:SS-YYYY/MM/DD_HH:MM:SS" in the cell to collect all the necessary files showing above

Changes

 

Cause

Sign In with your My Oracle Support account

Don't have a My Oracle Support account? Click to get started

My Oracle Support provides customers with access to over a
Million Knowledge Articles and hundreds of Community platforms