My Oracle Support Banner

Solaris sun4v domains may panic after 1101 days of uptime (Doc ID 2245358.1)

Last updated on JANUARY 18, 2018

Applies to:

SPARC M7-16 - Version All Versions and later
Oracle SuperCluster M7 Hardware - Version All Versions and later
SPARC T5-2 - Version All Versions and later
SPARC M5-32 - Version All Versions and later
SPARC S7-2 - Version All Versions and later
Information in this document applies to any platform.
This issue applies to SPARC servers of machine implementation "sun4v". The Solaris command 'uname -i' can be used to display the machine implementation.

Hypervisor 1.0 introduced the bug, but the panic can only be manifested on servers with Hypervisor 1.12.x or later. All "sun4v" servers use a hypervsior.

Symptoms

A bug exists in hypervisor (HV) 1.12.x or later which may cause a domain to panic after 1101 days of uptime.  The HV version may be displayed with the Solaris command 'ldm -V | grep ")Hypervisor"'.
e.g.,

% ldm -V | grep ")Hypervisor"
Hypervisor v. 1.15.5.a @(#)Hypervisor 1.15.5.a 2016/08/09 15:21

Various types of panic might be evident in the HOST console log, including, "panic: send_mondo_set: timeout".

The ILOM event log (-> show /SP/logs/event/list) may have sufficient history to check the HOST uptime.
Look for Host "Powered on" or "HV started".

Example (event log):

225 Thu Jan 26 14:17:22 2017 System Log minor
Host: Solaris panicking                                            <======================panic date/time
<snip>
199 Tue Jan 21 13:57:43 2014 System Log minor
Host: Host started
198 Tue Jan 21 13:57:39 2014 System Log minor
Host: HV started                                                     <======================HV start date/time
197 Tue Jan 21 13:49:00 2014 System Log minor
Host: Powered On

The Solaris GNU date command can be used to easily calculate a date 1101 days earlier from the panic date.

Example:
% /usr/gnu/bin/date -d "Jan 26 2017 - 1101 days"
Tuesday, January 21, 2014 12:00:00 AM PST

Many date calculators are available via Internet search that display the duration between two dates.

If the ILOM event log does not have sufficient history to check when "HV started" it may still be possible to
identify an uptime near 1101 days.  The Solaris last command (e.g., last -5 reboot) will list when the
Solaris was last booted.  This does NOT identify when "HV started", but if the uptime is near 1101 days it is
a reasonable deduction to match this bug as cause for the panic.

If either of the two techniques identifies a period of run time,

  1. from when HV starts, or
  2. the last occasion for Solaris boot

and the duration to panic is at or near 1101 days, then bug 23193383 has likely been manifested.
Oracle can analyze snapshot HOST status logs for confirmation.

Cause

To view full details, sign in with your My Oracle Support account.

Don't have a My Oracle Support account? Click to get started!


My Oracle Support provides customers with access to over a million knowledge articles and a vibrant support community of peers and Oracle experts.