HDFS is in Bad Health/One NameNode is Down Due to a Local Sync Delay Caused by a RAID FW Fault Followed by an Adapter Failure (Doc ID 2264221.1)

Last updated on MAY 31, 2017

Applies to:

Big Data Appliance Integrated Software - Version 4.5.0 to 4.7.0 [Release 4.5 to 4.7]
Linux x86-64

Symptoms

Symptoms look like:

1. In Cloudera Manager hdfs is in bad health due to one NameNode being down. The NameNode log shows:

2017-04-23 12:21:21,879 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1
2017-04-23 12:21:21,885 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at hostname/IP

2. /var/log/dmesg indicates FW in FAULT state followed by an adapter failure:

...
megasas: Found FW in FAULT state, will reset adapter scsi0.
megaraid_sas: resetting fusion adapter scsi0.
megasas: Waiting for FW to come to ready state
megasas: FW now in Ready state
megasas:IOC Init cmd success megaraid_sas: Reset successful for scsi0.
...

Cause

Sign In with your My Oracle Support account

Don't have a My Oracle Support account? Click to get started

My Oracle Support provides customers with access to over a
Million Knowledge Articles and hundreds of Community platforms