Yarn Node Manager is Down with " Node mismatch after server started" Errors After Upgrade To BDA 4.9

(Doc ID 2327370.1)

Last updated on NOVEMBER 12, 2017

Applies to:

Big Data Appliance Integrated Software - Version 4.9.0 and later
Linux x86-64

Symptoms

After upgrade to BDA V4.9, the Yarn, Node Manager is down on one BDA node.  Trying to restart the Node Manager fails with:

2017-11-12 17:43:03,408 INFO org.apache.hadoop.service.AbstractService: Service NodeManager failed in state STARTED; cause: org.apache.hadoop.service.ServiceStateException: java.io.IOException: Node mismatch after server started, expected '<ip address>:8041' but found '<hostname.fqdn>:8041'
org.apache.hadoop.service.ServiceStateException: java.io.IOException: Node mismatch after server started, expected '<ip address>:8041' but found '<hostname.fqdn>:8041'
at org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59)
at org.apache.hadoop.service.AbstractService.start(AbstractService.java:204)
at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:120)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceStart(NodeManager.java:311)
at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:545)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:591)
Caused by: java.io.IOException: Node mismatch after server started, expected '<ip address>:8041' but found '<hostname.fqdn>:8041'
at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceStart(ContainerManagerImpl.java:460)
at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
... 5 more

Checking the /etc/resolv.conf file on all cluster hosts shows that the /etc/resolv.conf on the node with the failed Node Manager is not the same as the others.  

For example the following run as 'root' user from Node 1 shows that the /etc/resolv.conf on the failed Node Manager is different than the others:

 

Cause

Sign In with your My Oracle Support account

Don't have a My Oracle Support account? Click to get started

My Oracle Support provides customers with access to over a
Million Knowledge Articles and hundreds of Community platforms