Node With Multiple VMs Hangs Due To CPU Stall
(Doc ID 2286384.1)
Last updated on APRIL 12, 2022
Applies to:Oracle VM - Version 3.2.1 and later
Information in this document applies to any platform.
This document applies to Oracle Linux-based hosts with Oracle Linux-based guest VMs on engineered systems and non-engineered systems alike with one exception. This document does NOT apply to Oracle Exadata Cloud systems (OCI, OCI-C, ExaCC, etc). If you believe you are encountering the problem described here in an Oracle Exadata Cloud environment then please open a Service Request with Oracle Support.
For Exadata Cloud systems, refer to Doc ID 2459851.1.
A node and all running VMs suddenly become unresponsive forcing a power cycle to clear the problem.
After booting, the system operates normally with no further issues. Investigation into the issue reveals the following:
- The the ILOM hostconsole.log file for the node shows the following mlx4_core errors followed by CPU stalls with a stack trace. These may repeat several times.
To view full details, sign in with your My Oracle Support account.
Don't have a My Oracle Support account? Click to get started!