Disabling AD or MIT Kerberos Fails-Yarn Service will not Start Both Resource Managers are in Standby on BDA V4.2

(Doc ID 2053177.1)

Last updated on JANUARY 23, 2017

Applies to:

Big Data Appliance Integrated Software - Version 4.2.0 and later
Linux x86-64

Symptoms

The YARN service in Cloudera Manager (CM) may fail to start due to both Resource Managers (RM) starting as Standby when disabling Kerberos  (MIT Kerberos or AD Kerberos) on BDA V4.2 (CDH 5.4).  When disabling Kerberos it may be required to format the state store.

For MIT Kerberos removal see: Instructions to Disable Kerberos on Oracle Big Data Appliance with Mammoth V3.*/V4.* Releases (Doc ID 1919431.1); for AD Kerberos removal see: Instructions to Enable/Disable AD Kerberos on Oracle Big Data Appliance with Mammoth V4.2 Release (Doc ID 2029378.1).

For earlier BDA versions, see:
On Oracle Big Data Appliance 3.* Release and Higher in a Secure Cluster both Resource Managers are in Standby (Doc ID 1920509.1). Some of the same steps apply, but some differ in BDA V4.2 (CDH 5.4).

In the case of Kerberos removal when YARN will not start due to both RMs being in Standby the Resource Manager logs, hadoop-cmf-yarn-RESOURCEMANAGER-<FQDN-RMhost>.log.out point to a problem with the state store.  Hence the state store will have to be cleared.

2015-08-31 13:56:20,778 WARN org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=yarn OPERATION=transitionToActive    TARGET=RMHAProtocolService    RESULT=FAILURE    DESCRIPTION=Exception transitioning to active    PERMISSIONS=All users are allowed
2015-08-31 13:56:20,778 WARN org.apache.hadoop.ha.ActiveStandbyElector: Exception handling the winning of election
org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active
    at org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:124)
    at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:804)
    at org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415)
    at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
    at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when transitioning to Active mode
    at org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:304)
    at org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:122)
    ... 4 more
Caused by: org.apache.hadoop.service.ServiceStateException: org.apache.hadoop.yarn.server.resourcemanager.recovery.StoreFencedException: RMStateStore has been fenced
    at org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59)
    at org.apache.hadoop.service.AbstractService.start(AbstractService.java:204)
    at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:570)
    at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
    at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1003)
    at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1044)
    at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1040)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
    at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1040)
    at org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:295)
    ... 5 more
Caused by: org.apache.hadoop.yarn.server.resourcemanager.recovery.StoreFencedException: RMStateStore has been fenced
    at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1102)
    at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.fence(ZKRMStateStore.java:336)
    at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.startInternal(ZKRMStateStore.java:287)
    at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.serviceStart(RMStateStore.java:478)
    at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
    ...

 

Cause

Sign In with your My Oracle Support account

Don't have a My Oracle Support account? Click to get started

My Oracle Support provides customers with access to over a
Million Knowledge Articles and hundreds of Community platforms