Eh_timeout And Eh_deadline Missing in UEK2 and UEK3
(Doc ID 2265257.1)
Last updated on FEBRUARY 17, 2019
Applies to:Linux OS - Version Oracle Linux 6.0 with Unbreakable Enterprise Kernel [2.6.32] to Oracle Linux 7.1 [Release OL6 to OL7U1]
Oracle Cloud Infrastructure - Version N/A and later
When there are IO timeout the Linux kernel SCSI error handler logic proceeds through a sequence of recovery methods and it attempts to recover failing devices or transports while causing as little disruption to other IO taking place on the system as possible. The standard recovery levels are executed in order with an escalation to the next level whenever a recovery attempt fails, or a subsequent SCSI Test Unit Ready (TUR) command fails.
In a situation where all operations on the external storage time out (for example due to a failed SAN fabric component not allowing to pass any traffic or report any error condition) this logic can lead to very long delays in failing IO where there are large numbers of devices or targets (since each reset level is repeated for each outstanding command, device, target etc.).
By setting an overall limit on the time spent attempting these operations (and immediately proceeding to the HBA reset if this time expires) the features discussed in this solution provide more consistent and predictable system behavior when faults of this nature occur.
To view full details, sign in with your My Oracle Support account.
Don't have a My Oracle Support account? Click to get started!