Exadata Database Or ASM Instance Hangs On IO Request When Single Cell Has Problem
(Doc ID 2002084.1)
Last updated on FEBRUARY 16, 2019
Applies to:Oracle Exadata Storage Server Software - Version 22.214.171.124.1 to 126.96.36.199.0 [Release 11.2 to 12.1]
Information in this document applies to any platform.
In Exadata we see IO related waits causing Database or ASM instance to hang or perform very slow.
There will be issue with a single cell server either all the disks will be offline or cellsrv process will be hung causing the IO to hang in disk subsystem.
In the Cluster logs we can see some typical errors as below -
2015-02-26 14:49:26.033509: DISKMON:4119796032: dskm_bcast_oss6: oss_wait for device o/192.168.*.*(inc 0) did not complete in 5000 msec
LGWR background trace files -
First ossnet_wait_all timeout at :
ORA-27626: Exadata Error :2201 (IO cancelled due to slow/hung disk)
Exadata error : 'IO cancelled due to slow/hung disk'
IO elapsed time : 213596 usec Time waited on I/O: 213596 usec
ossnet_wait_all:WAITED TOO LONG for network request completion :60001
Dumping State information
If you see above , the timeout was preceeded by I/O cancel error from the cell . There is some problem with disk subsystem .
After the dump we shoud see the LGWR trying to reconnect and it shows in the trace file--
connect:sosstcread failed- 1923.168.*.*, return bytes: 0
OS system dependent operation:unexpected_size3 failed with status:115
OS failure message :Operation now in progress
failure occured at: sosstcpredt
connect retry:sleeping for 2 seconds, connect 2 out of maximum 7 attempts
It is also possible that the on cell sever system disk was read only
To view full details, sign in with your My Oracle Support account.
Don't have a My Oracle Support account? Click to get started!