On BDA 4.2 the Cloudera Manager Agent is Not Able to Communicate with this Role's Web Server (Doc ID 2122658.1)

Last updated on APRIL 01, 2016

Applies to:

Big Data Appliance Integrated Software - Version 4.2.0 and later
Linux x86-64

Symptoms

The alerts below are received from Node4. For sometime Node04 was disconnected from CM.

The health test result for RESOURCE_MANAGER_WEB_METRIC_COLLECTION has become bad: The Cloudera Manager Agent is not able to communicate with this role's web server.
Time: Mar 24, 2016 2:35:56 PM
View Details on <bdanode03.example.com>
Monitor Startup: false
Role: resourcemanager (<bdanode04>)
Role Type: ResourceManager
Cluster: <cluster name>
Cluster Display Name: <cluster name>
Service: yarn
Service Display Name: yarn
Service Type: YARN (MR2 Included)
Hosts: <bdanode04.example.com>

Health Test Name Event Code Severity Content
RESOURCE_MANAGER_WEB_METRIC_COLLECTION Role health test bad Critical The health test result for RESOURCE_MANAGER_WEB_METRIC_COLLECTION has become bad: The Cloudera Manager Agent is not able to communicate with this role's web server.

 

From the Node 4 log:

2016-03-24 14:33:25,886 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: ZKRMStateStore Session connected
2016-03-24 14:33:58,208 WARN org.mortbay.log: EXCEPTION
javax.net.ssl.SSLException: Received close_notify during handshake at sun.security.ssl.Alerts.getSSLException(Alerts.java:208)
...
at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
2016-03-24 14:35:51,667 INFO org.apache.zookeeper.ClientCnxn: Client session timed out, have not heard from server in 26811ms for sessionid 0x3538c34053a004a, closing socket connection and attempting reconnect
2016-03-24 14:35:51,668 INFO org.apache.zookeeper.ClientCnxn: Client session timed out, have not heard from server in 46663ms for sessionid 0x2538c18065b2b13, closing socket connection and attempting reconnect
2016-03-24 14:35:51,680 WARN org.mortbay.log: EXCEPTION
javax.net.ssl.SSLHandshakeException: Remote host closed connection during handshake ...
at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
Caused by: java.io.EOFException: SSL peer shut down incorrectly at sun.security.ssl.InputRecord.read(InputRecord.java:505)
at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:961)
... 5 more
2016-03-24 14:35:51,698 WARN org.mortbay.log: EXCEPTION
javax.net.ssl.SSLHandshakeException: Remote host closed connection during handshake at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:980)
..
at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:961)
... 5 more
2016-03-24 14:35:51,768 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Watcher event type: None with state:Disconnected for path:null for Service org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore in state org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: STARTED
2016-03-24 14:35:51,768 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: ZKRMStateStore Session disconnected

From the Node 3 log shows:

2016-03-24 14:35:15,565 WARN org.apache.hadoop.security.authentication.server.AuthenticationFilter: AuthenticationToken ignored: AuthenticationToken expired
2016-03-24 14:35:15,570 WARN org.apache.hadoop.security.authentication.server.AuthenticationFilter: AuthenticationToken ignored: AuthenticationToken expired
2016-03-24 14:35:30,559 WARN org.apache.hadoop.security.authentication.server.AuthenticationFilter: AuthenticationToken ignored: AuthenticationToken expired
2016-03-24 14:35:30,562 WARN org.apache.hadoop.security.authentication.server.AuthenticationFilter: AuthenticationToken ignored: AuthenticationToken expired
2016-03-24 14:35:36,021 INFO org.apache.hadoop.ha.ActiveStandbyElector: Checking for any old active which needs to be fenced...
2016-03-24 14:35:36,034 INFO org.apache.hadoop.ha.ActiveStandbyElector: Old node exists: 0a067961726e524d1204726d3136
2016-03-24 14:35:36,034 INFO org.apache.hadoop.ha.ActiveStandbyElector: Writing znode /yarn-leader-election/yarnRM/ActiveBreadCrumb to indicate that the local node is the most recent active...
2016-03-24 14:35:36,045 INFO org.apache.hadoop.conf.Configuration: found resource yarn-site.xml at file:/var/run/cloudera-scm-agent/process/12223-yarn-RESOURCEMANAGER/yarn-site.xml
2016-03-24 14:35:36,047 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=yarn OPERATION=refreshAdminAcls TARGET=AdminService RESULT=SUCCESS
2016-03-24 14:35:36,047 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Transitioning to active state

And what is believed was the ZK leader:
2016-03-24 14:33:25,686 INFO org.apache.zookeeper.server.NIOServerCnxn: Closed socket connection for client /192.168.8.4:20783 which had sessionid 0x35387a142640502
2016-03-24 14:33:25,695 ERROR org.apache.zookeeper.server.NIOServerCnxn: Unexpected Exception:
java.nio.channels.CancelledKeyException
at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:73)
at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:77)
at org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.java:153)
at org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn.java:1075)
at org.apache.zookeeper.server.NIOServerCnxn.process(NIOServerCnxn.java:1112)
at org.apache.zookeeper.server.DataTree.setWatches(DataTree.java:1327)
at org.apache.zookeeper.server.ZKDatabase.setWatches(ZKDatabase.java:384)
at org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:304)
at org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:74)
2016-03-24 14:33:25,696 ERROR org.apache.zookeeper.server.NIOServerCnxn: Unexpected Exception:
java.nio.channels.CancelledKeyException
at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:73)
at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:77)
at org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.java:153)
at org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn.java:1075)
at org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:404)
at org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:74)
2016-03-24 14:33:50,553 INFO org.apache.zookeeper.server.NIOServerCnxnFactory: Accepted socket connection from /192.168.8.3:44312
2016-03-24 14:33:50,553 INFO org.apache.zookeeper.server.ZooKeeperServer: Client attempting to establish new session at /192.168.8.3:44312

 

Cause

Sign In with your My Oracle Support account

Don't have a My Oracle Support account? Click to get started

My Oracle Support provides customers with access to over a
Million Knowledge Articles and hundreds of Community platforms