My Oracle Support Banner

BDCS 4.11.1 Cluster Extension Leaves NameNodes in Bad Health Due to "NameNode Handler Count"/"NameNode Service Handler Count" Not Being Reset to Reflect the Number of DataNodes in the Extended Cluster (Doc ID 2476084.1)

Last updated on JULY 20, 2024

Applies to:

Big Data Cloud Service - Version 1.0 and later
Big Data Appliance Integrated Software - Version 4.11.0 and later
Linux x86-64

Symptoms

After BDCS 4.11.1 cluster extension the NameNodes (NN) are found to go in and out of "bad"/"red" health.

The example here is based on a cluster extension from 3 nodes to 5 nodes.

Additional symptoms look like:

1. Cloudera Manager(CM) reports a message like below when the Active NN goes into "bad"/"red" health:

The filesystem checkpoint was 1 day(s), 2 hour(s), 37 minute(s) old.
This was 2,662.33% of the configured checkpoint period of 1 hour(s).
Critical threshold: 400.00%. 76,067 transactions had occurred since the last filesystem checkpoint.
This was 7.61% of the configured checkpoint transaction target of 1,000,000.


2. The Active NN log at: /var/log/hadoop-hdfs/hadoop-cmf-hdfs-NAMENODE-<HOSTNAME>.<DOMAINNAME>.log.out, continuously raises:
"In safe mode extension. Safe mode will be turned off automatically in 27 seconds" as below, indicating the Active NN is going into and out of safe mode.

INFO org.apache.hadoop.ipc.Server: IPC Server handler 0 on 8020, call org.apache.hadoop.hdfs.protocol.ClientProtocol.create from : Call#<##> Retry#<##>
org.apache.hadoop.ipc.RetriableException: org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot create file/user/spark/spark2ApplicationHistory/.<###>. Name node is in safe mode.
The reported blocks 11340 has reached the threshold 0.9990 of total blocks 11340. The number of live datanodes 5 has reached the minimum number 1. In safe mode extension. Safe mode will be turned off automatically in 27 seconds.
  at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkNameNodeSafeMode(FSNamesystem.java:1525)
  at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2824)
  at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2711)
  at org.apache.hadoop.hdfs.server.namen ...


3. In CM, hdfs also reports "Problems in the configuration" for the "NameNode Hander Count" (the number of server threads for the NameNode) and the "NameNode Service Handler Count" (the number of server threads for the NameNode used for service calls. Only used when NameNode Service RPC Port is configured) which look like:

hdfs: NameNode Handler Count
NameNode Handler Count is recommended to be at least ln(number of datanodes) * 20. Suggested minimum value: 32

 and

hdfs: NameNode Service Handler Count
NameNode Service Handler Count is recommended to be at least ln(number of datanodes) * 20. Suggested minimum value: 32


For example:




4. Checking both "NameNode Handler Count" and "NameNode Service Handler Count" in Cloudera Manager shows the values are set to the default of 30, which would be sufficient for a 3 node cluster, and  not to the required value of "ln(# DataNodes)*20 which in the case of 5 DataNodes is 32.

a) Navigating in Cloudera Manager: hdfs > Configuration > Search: "NameNode Handler Count" > shows a value of 30:

See:



b) Navigating in Cloudera Manager: hdfs > Configuration > Search: "NameNode Service Handler Count" > shows a value of 30:

See:

Cause

To view full details, sign in with your My Oracle Support account.

Don't have a My Oracle Support account? Click to get started!


In this Document
Symptoms
Cause
Solution


My Oracle Support provides customers with access to over a million knowledge articles and a vibrant support community of peers and Oracle experts.