My Oracle Support Banner

HBase Master Goes Down with "WARN org.apache.hadoop.hbase.master.SplitLogManager: error while splitting logs" (Doc ID 2515432.1)

Last updated on APRIL 17, 2023

Applies to:

Big Data Appliance Integrated Software - Version 4.11.0 and later
Linux x86-64

Symptoms

The HBase service goes down on regular basis with the HBase master(s) being in bad health.

The HBase master(s) log, hbase-cmf-hbase-MASTER-<HOSTNAME>.<DOMAINNAME>.log.out, shows an error while splitting logs as below:

<TIMESTAMP> WARN org.apache.hadoop.hbase.master.SplitLogManager: error while splitting logs in [hdfs://<CLUSTERNAME>-ns/hbase/WALs/<HOSTNAME>.<DOMAINNAME>,60020,<##>-splitting] installed = 1 but only 0 done
<TIMESTAMP> FATAL org.apache.hadoop.hbase.master.HMaster: Failed to become active master
java.io.IOException: error or interrupted while splitting logs in [hdfs://<CLUSTERNAME>-ns/hbase/WALs/<HOSTNAME>.<DOMAINNAME>,60020,<ID>-splitting] Task = installed = 1 done = 0 error = 1
at org.apache.hadoop.hbase.master.SplitLogManager.splitLogDistributed(SplitLogManager.java:291)
at org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:436)
at org.apache.hadoop.hbase.master.MasterFileSystem.splitMetaLog(MasterFileSystem.java:346)
at org.apache.hadoop.hbase.master.MasterFileSystem.splitMetaLog(MasterFileSystem.java:337)
at org.apache.hadoop.hbase.master.HMaster.splitMetaLogBeforeAssignment(HMaster.java:1073)
at org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:770)
at org.apache.hadoop.hbase.master.HMaster.access$500(HMaster.java:194)
at org.apache.hadoop.hbase.master.HMaster$1.run(HMaster.java:1834)
at java.lang.Thread.run(Thread.java:748)
<TIMESTAMP> FATAL org.apache.hadoop.hbase.master.HMaster: Master server abort: loaded coprocessors are: [org.apache.hadoop.hbase.security.access.AccessController, com.cloudera.navigator.audit.hbase.MasterAuditCoProcessor]
<TIMESTAMP> FATAL org.apache.hadoop.hbase.master.HMaster: Unhandled exception. Starting shutdown.
java.io.IOException: error or interrupted while splitting logs in [hdfs://<CLUSTERNAME>-ns/hbase/WALs/<HOSTNAME>.<DOMAINNAME>,60020,<ID>-splitting] Task = installed = 1 done = 0 error = 1
...
<TIMESTAMP> INFO org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Unhandled exception. Starting shutdown.
<TIMESTAMP> INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Stopping infoServer
...


The associated Region Server log, /var/log/hbase/hbase-cmf-hbase-REGIONSERVER-<HOSTNAME>.<DOMAINNAME>.log.out, reports:

<TIMESTAMP> INFO org.apache.hadoop.hbase.regionserver.wal.FSHLog: New WAL /hbase/WALs/<HOSTNAME>.<DOMAINNAME>,60020,<PATH>/<HOSTNAME>.<DOMAINNAME><FILENAME1.meta>
<TIMESTAMP> WARN org.apache.hadoop.hbase.regionserver.wal.FSHLog: Riding over failed WAL close of hdfs://<CLUSTERNAME>-ns/hbase/WALs/<HOSTNAME>.<DOMAINNAME>,60020,<PATH>/<HOSTNAME>.<DOMAINNAME><FILENAME1.meta>, cause="All datanodes DatanodeInfoWithStorage[<PRIVATE IP HOST>:50010,DS-<ID>,DISK] are bad. Aborting...", errors=1; THIS FILE WAS NOT CLOSED BUT ALL EDITS SYNCED SO SHOULD BE OK
<TIMESTAMP> INFO org.apache.hadoop.hbase.regionserver.wal.FSHLog: Rolled WAL /hbase/WALs/<HOSTNAME>.<DOMAINNAME>,60020,<PATH>/<HOSTNAME>.<DOMAINNAME><FILENAME1.meta> with entries=9, filesize=3.49 KB; new WAL /hbase/WALs/<HOSTNAME>.<DOMAINNAME>,60020,<PATH>/<HOSTNAME>.<DOMAINNAME><FILENAME2.meta>

 

Cause

To view full details, sign in with your My Oracle Support account.

Don't have a My Oracle Support account? Click to get started!


In this Document
Symptoms
Cause
Solution


My Oracle Support provides customers with access to over a million knowledge articles and a vibrant support community of peers and Oracle experts.