Sun Fire[TM] 15K/12K/E20K/E25K Servers: RCM Daemon hanging, causing DR operations to hang (Doc ID 1002125.1)

Last updated on MAY 01, 2017

Applies to:

Sun Fire 15K Server - Version All Versions and later
Sun Fire E25K Server - Version All Versions and later
All Platforms

Symptoms

Remote Dynamic Reconfiguration(DR) operation(from the System Controller), or local DR operation(from the domain), works fine until a DR operation does not respond; reporting in the $SMSLOGGER/domain_Id/messages file, messages like:

DCA/DCS communication error

and/or

dca[...]-S(): [... ERR DCSInterface.cc 378] message receive failed: DCSInterface :: receiveResponse errCode:502

In some cases, it may not be possible to kill the associated process(cfgadm, rcfgadm, deleteboard, showdevices).

Note: for background information on the DR mechanism, please refer to Document 1003582.1 (What Happens in a Sun Fire[TM] 15K/12K DR Slot0 Detach Operation).

Note: if none of the symptoms described below is true, and the signature of the rcm_daemon stack is not the same as described in this article, then it's more likely that you are facing a different issue. See the References section below for more troubleshooting steps.

Cause

Sign In with your My Oracle Support account

Don't have a My Oracle Support account? Click to get started

My Oracle Support provides customers with access to over a
Million Knowledge Articles and hundreds of Community platforms