My Oracle Support Banner

ILOM sending occasional incorrect sensor readings via IPMI when being polled by hwmgmtd on BDA V4.2 (Doc ID 2081723.1)

Last updated on FEBRUARY 21, 2019

Applies to:

Big Data Appliance X4-2 Hardware - Version All Versions and later
Linux x86-64

Symptoms

The problem symptoms are as follows:

1. On BDA V4.2, X4-2 HW, OS OL 6.6, with ILOM version: Version 3.1.2.32 Copyright (c) 2006, 2013, the following ILOM  "Temperature", "Fan Speed" and "Other" Warnings are periodically raised like:

Sep 21 16:12:36 <HOSTNAME3> hwmgmtd[13804]: State change: overall alarm state changed from "Cleared" (1) to "Critical" (2).
Sep 21 16:12:36 <HOSTNAME3> hwmgmtd[13804]: State change: alarm state of subsystem "Temperature" changed state from "Cleared" (1) to "Critical" (2).
Sep 21 16:12:36 <HOSTNAME3> hwmgmtd[13804]: State change: alarm state of subsystem "Fan Speed" changed state from "Cleared" (1) to "Critical" (2).
Sep 21 16:12:36 <HOSTNAME3> hwmgmtd[13804]: State change: alarm state of subsystem "Other" changed state from "Cleared" (1) to "Major" (3).
Sep 21 16:13:14 <HOSTNAME3> modprobe: WARNING: Deprecated config file /etc/modprobe.conf, all config files belong into /etc/modprobe.d/.
Sep 21 16:14:20 <HOSTNAME3> modprobe: WARNING: Deprecated config file /etc/modprobe.conf, all config files belong into /etc/modprobe.d/.
Sep 21 16:15:13 <HOSTNAME3> hwmgmtd[13804]: State change: overall alarm state changed from "Critical" (2) to "Cleared" (1).
Sep 21 16:15:13 <HOSTNAME3> hwmgmtd[13804]: State change: alarm state of subsystem "Temperature" changed state from "Critical" (2) to "Cleared" (1).
Sep 21 16:15:13 <HOSTNAME3> hwmgmtd[13804]: State change: alarm state of subsystem "Fan Speed" changed state from "Critical" (2) to "Cleared" (1).
...


2. Searching for hwmgmtd in /var/log/messages also shows lots of related errors like:

# grep hwmgmtd /var/log/messages
Oct 25 09:08:52 <HOSTNAME3> hwmgmtd[12805]: State change: indicator: /SYS/MB/FM0/OK (ID: 208) changed state from "On" (4) to "Off" (3).
Oct 25 09:08:52 <HOSTNAME3> hwmgmtd[12805]: State change: indicator: /SYS/MB/FM1/OK (ID: 209) changed state from "On" (4) to "Off" (3).
Oct 25 09:08:52 <HOSTNAME3> hwmgmtd[12805]: State change: service indicator: /SYS/SERVICE (ID: 213) changed state from "Off" (3) to "On" (4).
Oct 25 09:08:52 <HOSTNAME3> hwmgmtd[12805]: State change: locator indicator: /SYS/LOCATE (ID: 214) changed state from "Off" (3) to "On" (4).
Oct 25 09:08:52 <HOSTNAME3> hwmgmtd[12805]: State change: indicator: /SYS/SP/OK (ID: 215) changed state from "On" (4) to "Off" (3).
Oct 25 09:08:52 <HOSTNAME3> hwmgmtd[12805]: State change: indicator: /SYS/PS_FAULT (ID: 217) changed state from "Off" (3) to "On" (4).
Oct 25 09:09:54 <HOSTNAME3> hwmgmtd[12805]: State change: indicator: /SYS/MB/FM0/OK (ID: 208) changed state from "Off" (3) to "On" (4).
...

3. But the ILOM snapshot shows: the Fault leds are off, the fma did not log any fault, and the sel events are clear as well.

Cause

To view full details, sign in with your My Oracle Support account.

Don't have a My Oracle Support account? Click to get started!


In this Document
Symptoms
Cause
Solution
References


My Oracle Support provides customers with access to over a million knowledge articles and a vibrant support community of peers and Oracle experts.