BDCS 4.11.1 Cluster Extension Leaves NameNodes in Bad Health Due to "NameNode Handler Count"/"NameNode Service Handler Count" Not Being Reset to Reflect the Number of DataNodes in the Extended Cluster
(Doc ID 2476084.1)
Last updated on JULY 20, 2024
Applies to:
Big Data Cloud Service - Version 1.0 and laterBig Data Appliance Integrated Software - Version 4.11.0 and later
Linux x86-64
Symptoms
After BDCS 4.11.1 cluster extension the NameNodes (NN) are found to go in and out of "bad"/"red" health.
The example here is based on a cluster extension from 3 nodes to 5 nodes.
Additional symptoms look like:
1. Cloudera Manager(CM) reports a message like below when the Active NN goes into "bad"/"red" health:
This was 2,662.33% of the configured checkpoint period of 1 hour(s).
Critical threshold: 400.00%. 76,067 transactions had occurred since the last filesystem checkpoint.
This was 7.61% of the configured checkpoint transaction target of 1,000,000.
2. The Active NN log at: /var/log/hadoop-hdfs/hadoop-cmf-hdfs-NAMENODE-<HOSTNAME>.<DOMAINNAME>.log.out, continuously raises:
"In safe mode extension. Safe mode will be turned off automatically in 27 seconds" as below, indicating the Active NN is going into and out of safe mode.
org.apache.hadoop.ipc.RetriableException: org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot create file/user/spark/spark2ApplicationHistory/.<###>. Name node is in safe mode.
The reported blocks 11340 has reached the threshold 0.9990 of total blocks 11340. The number of live datanodes 5 has reached the minimum number 1. In safe mode extension. Safe mode will be turned off automatically in 27 seconds.
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkNameNodeSafeMode(FSNamesystem.java:1525)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2824)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2711)
at org.apache.hadoop.hdfs.server.namen ...
3. In CM, hdfs also reports "Problems in the configuration" for the "NameNode Hander Count" (the number of server threads for the NameNode) and the "NameNode Service Handler Count" (the number of server threads for the NameNode used for service calls. Only used when NameNode Service RPC Port is configured) which look like:
NameNode Handler Count is recommended to be at least ln(number of datanodes) * 20. Suggested minimum value: 32
and
NameNode Service Handler Count is recommended to be at least ln(number of datanodes) * 20. Suggested minimum value: 32
For example:
4. Checking both "NameNode Handler Count" and "NameNode Service Handler Count" in Cloudera Manager shows the values are set to the default of 30, which would be sufficient for a 3 node cluster, and not to the required value of "ln(# DataNodes)*20 which in the case of 5 DataNodes is 32.
a) Navigating in Cloudera Manager: hdfs > Configuration > Search: "NameNode Handler Count" > shows a value of 30:
See:
b) Navigating in Cloudera Manager: hdfs > Configuration > Search: "NameNode Service Handler Count" > shows a value of 30:
See:
Cause
To view full details, sign in with your My Oracle Support account. |
|
Don't have a My Oracle Support account? Click to get started! |
In this Document
Symptoms |
Cause |
Solution |