After Enabling Kerberos on Oracle Big Data Appliance 4.5.0 Both Resource Managers Go into Standby Mode (Doc ID 2151768.1)

Last updated on JUNE 24, 2016

Applies to:

Big Data Appliance Integrated Software - Version 4.5.0 and later
Linux x86-64

Symptoms

After enabling Kerberos on BDA 4.5.0, both Resource Managers go in to standby mode.

The error from Resource Manager logs is like this:

2016-06-16 21:07:39,030 WARN org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=yarn OPERATION=transitionToActive TARGET=RMHAProtocolService RESULT=FAILURE DESCRIPTION=Exception transitioning to active PERMISSIONS=Users [yarn] and members of the groups [<users>] are allowed
2016-06-16 21:07:39,030 WARN org.apache.hadoop.ha.ActiveStandbyElector: Exception handling the winning of election
org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active
  at org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:124)
  at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:812)
  at org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:417)
  at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
  at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when transitioning to Active mode
  at org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:304)
  at org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:122)
  ... 4 more
Caused by: org.apache.hadoop.service.ServiceStateException: org.apache.hadoop.yarn.server.resourcemanager.recovery.StoreFencedException: RMStateStore has been fenced
  at org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59)
  at org.apache.hadoop.service.AbstractService.start(AbstractService.java:204)
  at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:577)
  at org.apache.hadoop.service.Abstr



Following the MOS note, After Enabling Kerberos on Oracle Big Data Appliance 4.5.0 Both Resource Managers Went In To Standby (Doc ID 2151768.1),  as below does not resolve the issue:

1. Home ---> Status ---> Zookeeper ---> Configuration ---> Server Default Group ---> Advanced

2. Edit "Java Configuration Options for Zookeeper Server" parameter and enter
-Dzookeeper.DigestAuthenticationProvider.superDigest=

Using the sample output from step5 above the property setting would be:
-Dzookeeper.DigestAuthenticationProvider.superDigest=super:cY+9eK20soteVC3fQ83SXDvwlP0=

3. Click 'Save Changes'.

4. Restart Zookeeper service.

a) Stop YARN service from Cloudera manager
b) Log in to zookeeper cli as below – note : you have to log in to leader node , you can decide current leader node from Cloudera manager, go to zookeeper service -> in the status tab you can see

From leader node (ssh) run below commands

# cd /opt/cloudera/parcels/CDH/lib/zookeeper/bin/
# ./zkCli.sh
[zk: localhost:2181(CONNECTED) 0]
[zk: localhost:2181(CONNECTED) 6] addauth digest super:cloudera
[zk: localhost:2181(CONNECTED) 7] ls /rmstore/ZKRMStateRoot
[AMRMTokenSecretManagerRoot, RMAppRoot, RMDTSecretManagerRoot, RMVersionNode, EpochNode]
[zk: localhost:2181(CONNECTED) 8] setAcl /rmstore/ZKRMStateRoot world:anyone:rwcda
cZxid = 0xb00009ed2
ctime = Thu Jun 26 17:34:51 EDT 2014
mZxid = 0xb00009ed2
mtime = Thu Jun 26 17:34:51 EDT 2014
pZxid = 0xb3000259f7
cversion = 11435929
dataVersion = 0
aclVersion = 192
ephemeralOwner = 0x0
dataLength = 0
numChildren = 5
[zk: localhost:2181(CONNECTED) 9] getAcl /rmstore/ZKRMStateRoot
'world,'anyone
: cdrwa
[zk: localhost:2181(CONNECTED) 10] rmr /rmstore/ZKRMStateRoot

3. Close the zookeeper cli session

4. Now start the YARN service from Cloudera manager

However even after following these steps to remove the /rmstore/ZKRMStateRoot and re-create, when yarn starts both Resource Managers go to stand by mode with errors as above.

Cause

Sign In with your My Oracle Support account

Don't have a My Oracle Support account? Click to get started

My Oracle Support provides customers with access to over a
Million Knowledge Articles and hundreds of Community platforms