After Node Reprovision/Other Server Resilency Commands - HDFS and Zookeeper Services in "Bad" Health on BDA V4.0

(Doc ID 1954228.1)

Last updated on JULY 14, 2016

Applies to:

Big Data Appliance Integrated Software - Version 4.0 and later
Linux x86-64


After reprovisioning a Node, for example Node 4 which has previously had its services migrated with:

# bdacli admin_cluster reprovision <node-name>

The following is observed:

1. In Cloudera Manager (CM) the hdfs service in 'Bad' health
The hdfs service reports:

On NameNode, bdanode01 (Standby) 1 failed status directories: /opt/hadoop/dfs/nn. Critical threshold: any.

2. The zookeeper service in 'Bad' health.  The Zookeeper Server Status (from CM > zookeeper) reports that one Zookeeper Follower is down like below:

The Zookeeper Leader on Node 2 is up
The Zookeeper Follower on Node 3 is up
The Zookeeper Follower on Node 1 is down

3. The Zookeeper log reports:

9:38:21.556 AM ERROR org.apache.zookeeper.server.SyncRequestProcessor
Severe unrecoverable error, exiting No space left on device
at Method)
at org.apache.zookeeper.server.persistence.FileTxnLog.commit(
at org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(
at org.apache.zookeeper.server.ZKDatabase.commit(
at org.apache.zookeeper.server.SyncRequestProcessor.flush(
at org

4. Going to /tmp on Node 1 finds a large amount of storage is being used to hold the <cluser-name>-cluster-install-summary logs.  For example:

a) Change directory to /tmp and examing disk usage reports the high consumption:



Sign In with your My Oracle Support account

Don't have a My Oracle Support account? Click to get started

My Oracle Support provides customers with access to over a
Million Knowledge Articles and hundreds of Community platforms