Failing disk can cause the system to be unresponsive with IO to ZFS zpool hanging
(Doc ID 1316513.1)
Last updated on MAY 31, 2018
Applies to:Solaris Operating System - Version 10 3/05 and later
Information in this document applies to any platform.
Failing disk in the ZFS zpool (mirror or raidz) can cause the system to be unresponsive. In order for ZFS to mark device bad, IO to the device (zio) should fail with error (errno set). If scsi layer is doing its own error recovery before timing out the IO and propagating the error to upper layer, then zfs won't know if the device is bad and thus it will continue to schedule IO to it. There is no timeout setting in ZFS. IO time out should be dealt at the lower layer (sd/ssd). SCSI drives have all kinds of retry tuning. If a drive is taking 30 seconds to perform IO, but is still present and the sd/ssd driver refuses to mark it bad, ZFS cannot do much about it.
In practice, disk drives that become really slow instead of failing outright are the real problem and ZFS does does not deal with this situation well. ZFS relies on sd/ssd layers to perform error recovery. Timeouts at these layers influence ZFS IO. So, there is no automatic way to take the disk out of service. One can write a scripts to monitor errors via "kstat -m sderr" to keep a log of device errors and if it reaches certain threshold then place the device offline by running:
# zpool offline <device>.
Common symptom when a disk began to fail is that zfs I/O hangs for extended periods of time, until a disk is manually removed. Error recovery commands such as: "Command Timeouts", "SCSI transport failed" and "synchronize cache command failed" etc.. are all retryable commands, indicating a disk that is failing, but not completely failed yet. Disk that has not completely failed would cause ZFS to wait for IO pending to such device. ZFS handles completely failed disk case properly.
ZFS behavior in case of IO error (error returned by ssd/sd driver) is dictated by the "failmode" property. In case of mirror/raidz configuration, pending IO to the bad vdev (disk) is routed to the good vdevs and system continue to function. If no good vdev is left and IO to the zpool is not possible, then zfs, in S10u6 and above, choose the "failmode" property value set for the zpool: "wait", "continue", and panic. ZFS freezes can be avoided in complete zpool failure by setting "failmode" property "continue".
Remember, "failmode" property becomes effective after failure is reported by the sd/ssd layer and there is no good redundant device available in zpool . IO failure conditions are then dealt by the "failmode" property, as described in zfs admin guide .
Intermittent errors or failing disk can cause IO to hang. One can avoid by continue monitoring disk errors reported by sd/ssd layer using "kstat -m sderr", reviewing /var/adm/messages and fma logs.
To view full details, sign in with your My Oracle Support account.
Don't have a My Oracle Support account? Click to get started!
In this Document