HBase Master Goes Down with "WARN org.apache.hadoop.hbase.master.SplitLogManager: error while splitting logs"
(Doc ID 2515432.1)
Last updated on APRIL 17, 2023
Applies to:
Big Data Appliance Integrated Software - Version 4.11.0 and laterLinux x86-64
Symptoms
The HBase service goes down on regular basis with the HBase master(s) being in bad health.
The HBase master(s) log, hbase-cmf-hbase-MASTER-<HOSTNAME>.<DOMAINNAME>.log.out, shows an error while splitting logs as below:
<TIMESTAMP> WARN org.apache.hadoop.hbase.master.SplitLogManager: error while splitting logs in [hdfs://<CLUSTERNAME>-ns/hbase/WALs/<HOSTNAME>.<DOMAINNAME>,60020,<##>-splitting] installed = 1 but only 0 done
<TIMESTAMP> FATAL org.apache.hadoop.hbase.master.HMaster: Failed to become active master
java.io.IOException: error or interrupted while splitting logs in [hdfs://<CLUSTERNAME>-ns/hbase/WALs/<HOSTNAME>.<DOMAINNAME>,60020,<ID>-splitting] Task = installed = 1 done = 0 error = 1
at org.apache.hadoop.hbase.master.SplitLogManager.splitLogDistributed(SplitLogManager.java:291)
at org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:436)
at org.apache.hadoop.hbase.master.MasterFileSystem.splitMetaLog(MasterFileSystem.java:346)
at org.apache.hadoop.hbase.master.MasterFileSystem.splitMetaLog(MasterFileSystem.java:337)
at org.apache.hadoop.hbase.master.HMaster.splitMetaLogBeforeAssignment(HMaster.java:1073)
at org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:770)
at org.apache.hadoop.hbase.master.HMaster.access$500(HMaster.java:194)
at org.apache.hadoop.hbase.master.HMaster$1.run(HMaster.java:1834)
at java.lang.Thread.run(Thread.java:748)
<TIMESTAMP> FATAL org.apache.hadoop.hbase.master.HMaster: Master server abort: loaded coprocessors are: [org.apache.hadoop.hbase.security.access.AccessController, com.cloudera.navigator.audit.hbase.MasterAuditCoProcessor]
<TIMESTAMP> FATAL org.apache.hadoop.hbase.master.HMaster: Unhandled exception. Starting shutdown.
java.io.IOException: error or interrupted while splitting logs in [hdfs://<CLUSTERNAME>-ns/hbase/WALs/<HOSTNAME>.<DOMAINNAME>,60020,<ID>-splitting] Task = installed = 1 done = 0 error = 1
...
<TIMESTAMP> INFO org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Unhandled exception. Starting shutdown.
<TIMESTAMP> INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Stopping infoServer
...
<TIMESTAMP> FATAL org.apache.hadoop.hbase.master.HMaster: Failed to become active master
java.io.IOException: error or interrupted while splitting logs in [hdfs://<CLUSTERNAME>-ns/hbase/WALs/<HOSTNAME>.<DOMAINNAME>,60020,<ID>-splitting] Task = installed = 1 done = 0 error = 1
at org.apache.hadoop.hbase.master.SplitLogManager.splitLogDistributed(SplitLogManager.java:291)
at org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:436)
at org.apache.hadoop.hbase.master.MasterFileSystem.splitMetaLog(MasterFileSystem.java:346)
at org.apache.hadoop.hbase.master.MasterFileSystem.splitMetaLog(MasterFileSystem.java:337)
at org.apache.hadoop.hbase.master.HMaster.splitMetaLogBeforeAssignment(HMaster.java:1073)
at org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:770)
at org.apache.hadoop.hbase.master.HMaster.access$500(HMaster.java:194)
at org.apache.hadoop.hbase.master.HMaster$1.run(HMaster.java:1834)
at java.lang.Thread.run(Thread.java:748)
<TIMESTAMP> FATAL org.apache.hadoop.hbase.master.HMaster: Master server abort: loaded coprocessors are: [org.apache.hadoop.hbase.security.access.AccessController, com.cloudera.navigator.audit.hbase.MasterAuditCoProcessor]
<TIMESTAMP> FATAL org.apache.hadoop.hbase.master.HMaster: Unhandled exception. Starting shutdown.
java.io.IOException: error or interrupted while splitting logs in [hdfs://<CLUSTERNAME>-ns/hbase/WALs/<HOSTNAME>.<DOMAINNAME>,60020,<ID>-splitting] Task = installed = 1 done = 0 error = 1
...
<TIMESTAMP> INFO org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Unhandled exception. Starting shutdown.
<TIMESTAMP> INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Stopping infoServer
...
The associated Region Server log, /var/log/hbase/hbase-cmf-hbase-REGIONSERVER-<HOSTNAME>.<DOMAINNAME>.log.out, reports:
<TIMESTAMP> INFO org.apache.hadoop.hbase.regionserver.wal.FSHLog: New WAL /hbase/WALs/<HOSTNAME>.<DOMAINNAME>,60020,<PATH>/<HOSTNAME>.<DOMAINNAME><FILENAME1.meta>
<TIMESTAMP> WARN org.apache.hadoop.hbase.regionserver.wal.FSHLog: Riding over failed WAL close of hdfs://<CLUSTERNAME>-ns/hbase/WALs/<HOSTNAME>.<DOMAINNAME>,60020,<PATH>/<HOSTNAME>.<DOMAINNAME><FILENAME1.meta>, cause="All datanodes DatanodeInfoWithStorage[<PRIVATE IP HOST>:50010,DS-<ID>,DISK] are bad. Aborting...", errors=1; THIS FILE WAS NOT CLOSED BUT ALL EDITS SYNCED SO SHOULD BE OK
<TIMESTAMP> INFO org.apache.hadoop.hbase.regionserver.wal.FSHLog: Rolled WAL /hbase/WALs/<HOSTNAME>.<DOMAINNAME>,60020,<PATH>/<HOSTNAME>.<DOMAINNAME><FILENAME1.meta> with entries=9, filesize=3.49 KB; new WAL /hbase/WALs/<HOSTNAME>.<DOMAINNAME>,60020,<PATH>/<HOSTNAME>.<DOMAINNAME><FILENAME2.meta>
<TIMESTAMP> WARN org.apache.hadoop.hbase.regionserver.wal.FSHLog: Riding over failed WAL close of hdfs://<CLUSTERNAME>-ns/hbase/WALs/<HOSTNAME>.<DOMAINNAME>,60020,<PATH>/<HOSTNAME>.<DOMAINNAME><FILENAME1.meta>, cause="All datanodes DatanodeInfoWithStorage[<PRIVATE IP HOST>:50010,DS-<ID>,DISK] are bad. Aborting...", errors=1; THIS FILE WAS NOT CLOSED BUT ALL EDITS SYNCED SO SHOULD BE OK
<TIMESTAMP> INFO org.apache.hadoop.hbase.regionserver.wal.FSHLog: Rolled WAL /hbase/WALs/<HOSTNAME>.<DOMAINNAME>,60020,<PATH>/<HOSTNAME>.<DOMAINNAME><FILENAME1.meta> with entries=9, filesize=3.49 KB; new WAL /hbase/WALs/<HOSTNAME>.<DOMAINNAME>,60020,<PATH>/<HOSTNAME>.<DOMAINNAME><FILENAME2.meta>
Cause
To view full details, sign in with your My Oracle Support account. |
|
Don't have a My Oracle Support account? Click to get started! |
In this Document
Symptoms |
Cause |
Solution |