My Oracle Support Banner

On BDA 4.2 the Cloudera Manager Agent is Not Able to Communicate with this Role's Web Server (Doc ID 2122658.1)

Last updated on AUGUST 29, 2022

Applies to:

Big Data Appliance Integrated Software - Version 4.2.0 and later
Linux x86-64

Symptoms

NOTE: In the examples that follow, user details, cluster names, hostnames, directory paths, filenames, etc. represent a fictitious sample (and are used to provide an illustrative example only). Any similarity to actual persons, or entities, living or dead, is purely coincidental and not intended in any manner.

The alerts below are received from Node4. For sometime Node04 was disconnected from CM.

The health test result for RESOURCE_MANAGER_WEB_METRIC_COLLECTION has become bad: The Cloudera Manager Agent is not able to communicate with this role's web server.
Time: Mar 24, 2016 2:35:56 PM
View Details on <bdanode03.example.com>
Monitor Startup: false
Role: resourcemanager (<bdanode04>)
Role Type: ResourceManager
Cluster: <cluster name>
Cluster Display Name: <cluster name>
Service: yarn
Service Display Name: yarn
Service Type: YARN (MR2 Included)
Hosts: <bdanode04.example.com>

Health Test Name Event Code Severity Content
RESOURCE_MANAGER_WEB_METRIC_COLLECTION Role health test bad Critical The health test result for RESOURCE_MANAGER_WEB_METRIC_COLLECTION has become bad: The Cloudera Manager Agent is not able to communicate with this role's web server.

 

From the Node 4 log:

2016-03-24 14:33:25,886 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: ZKRMStateStore Session connected
2016-03-24 14:33:58,208 WARN org.mortbay.log: EXCEPTION
javax.net.ssl.SSLException: Received close_notify during handshake at sun.security.ssl.Alerts.getSSLException(Alerts.java:208)
...
at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
2016-03-24 14:35:51,667 INFO org.apache.zookeeper.ClientCnxn: Client session timed out, have not heard from server in 26811ms for sessionid <SESSIONID1>, closing socket connection and attempting reconnect
2016-03-24 14:35:51,668 INFO org.apache.zookeeper.ClientCnxn: Client session timed out, have not heard from server in 46663ms for sessionid <SESSIONID2>, closing socket connection and attempting reconnect
2016-03-24 14:35:51,680 WARN org.mortbay.log: EXCEPTION
javax.net.ssl.SSLHandshakeException: Remote host closed connection during handshake ...
at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
Caused by: java.io.EOFException: SSL peer shut down incorrectly at sun.security.ssl.InputRecord.read(InputRecord.java:505)
at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:961)
... 5 more
2016-03-24 14:35:51,698 WARN org.mortbay.log: EXCEPTION
javax.net.ssl.SSLHandshakeException: Remote host closed connection during handshake at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:980)
..
at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:961)
... 5 more
2016-03-24 14:35:51,768 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Watcher event type: None with state:Disconnected for path:null for Service org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore in state org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: STARTED
2016-03-24 14:35:51,768 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: ZKRMStateStore Session disconnected

From the Node 3 log shows:

2016-03-24 14:35:15,565 WARN org.apache.hadoop.security.authentication.server.AuthenticationFilter: AuthenticationToken ignored: AuthenticationToken expired
2016-03-24 14:35:15,570 WARN org.apache.hadoop.security.authentication.server.AuthenticationFilter: AuthenticationToken ignored: AuthenticationToken expired
2016-03-24 14:35:30,559 WARN org.apache.hadoop.security.authentication.server.AuthenticationFilter: AuthenticationToken ignored: AuthenticationToken expired
2016-03-24 14:35:30,562 WARN org.apache.hadoop.security.authentication.server.AuthenticationFilter: AuthenticationToken ignored: AuthenticationToken expired
2016-03-24 14:35:36,021 INFO org.apache.hadoop.ha.ActiveStandbyElector: Checking for any old active which needs to be fenced...
2016-03-24 14:35:36,034 INFO org.apache.hadoop.ha.ActiveStandbyElector: Old node exists: <ID>
2016-03-24 14:35:36,034 INFO org.apache.hadoop.ha.ActiveStandbyElector: Writing znode /yarn-leader-election/yarnRM/ActiveBreadCrumb to indicate that the local node is the most recent active...
2016-03-24 14:35:36,045 INFO org.apache.hadoop.conf.Configuration: found resource yarn-site.xml at file:/var/run/cloudera-scm-agent/process/12223-yarn-RESOURCEMANAGER/yarn-site.xml
2016-03-24 14:35:36,047 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=yarn OPERATION=refreshAdminAcls TARGET=AdminService RESULT=SUCCESS
2016-03-24 14:35:36,047 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Transitioning to active state

And what is believed was the ZK leader:
2016-03-24 14:33:25,686 INFO org.apache.zookeeper.server.NIOServerCnxn: Closed socket connection for client /<PRIVATE_IP_HOST>:20783 which had sessionid <SESSIONID>
2016-03-24 14:33:25,695 ERROR org.apache.zookeeper.server.NIOServerCnxn: Unexpected Exception:
java.nio.channels.CancelledKeyException
at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:73)
at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:77)
at org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.java:153)
at org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn.java:1075)
at org.apache.zookeeper.server.NIOServerCnxn.process(NIOServerCnxn.java:1112)
at org.apache.zookeeper.server.DataTree.setWatches(DataTree.java:1327)
at org.apache.zookeeper.server.ZKDatabase.setWatches(ZKDatabase.java:384)
at org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:304)
at org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:74)
2016-03-24 14:33:25,696 ERROR org.apache.zookeeper.server.NIOServerCnxn: Unexpected Exception:
java.nio.channels.CancelledKeyException
at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:73)
at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:77)
at org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.java:153)
at org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn.java:1075)
at org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:404)
at org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:74)
2016-03-24 14:33:50,553 INFO org.apache.zookeeper.server.NIOServerCnxnFactory: Accepted socket connection from /<PRIVATE_IP_HOST3>:44312
2016-03-24 14:33:50,553 INFO org.apache.zookeeper.server.ZooKeeperServer: Client attempting to establish new session at /<PRIVATE_IP_HOST3>:44312

 

Cause

To view full details, sign in with your My Oracle Support account.

Don't have a My Oracle Support account? Click to get started!


In this Document
Symptoms
Cause
Solution


My Oracle Support provides customers with access to over a million knowledge articles and a vibrant support community of peers and Oracle experts.