Cloudera Manager Randomly Generates Alerts Even to the Point of Bringing Hue and Other Services into "Bad" Health
(Doc ID 2241305.1)
Last updated on JANUARY 22, 2020
Applies to:Big Data Appliance Integrated Software - Version 4.5.0 and later
Cloudera Manager (CM) randomly generates multiple alerts. It can get to the point that Hue and other services get into "Bad" health.
1. Alerts can look like:
Role: Hue Server (bdanode04)
HUE_SERVER_WEB_METRIC_COLLECTION Role health test bad Critical The health test result for HUE_SERVER_WEB_METRIC_COLLECTION has become bad: The Cloudera Manager Agent is not able to communicate with this role's web server.
Role: JournalNode (bdanode03)
JOURNAL_NODE_WEB_METRIC_COLLECTION Role health test bad Critical The health test result for JOURNAL_NODE_WEB_METRIC_COLLECTION has become bad: The Cloudera Manager Agent got an unexpected response from this role's web server.
HUE_HUE_SERVERS_HEALTHY Service health test bad Critical The health test result for HUE_HUE_SERVERS_HEALTHY has become bad: Healthy Hue Server: 0. Concerning Hue Server: 0. Total Hue Server: 1. Percent healthy: 0.00%. Percent healthy or concerning: 0.00%. Critical threshold: 51.00%.
HDFS_DATA_NODES_HEALTHY Service health test bad Critical The health test result for HDFS_DATA_NODES_HEALTHY has become bad: Healthy DataNode: 0. Concerning DataNode: 0. Total DataNode: 18. Percent healthy: 0.00%. Percent healthy or concerning: 0.00%. Critical threshold: 90.00%.
ZOOKEEPER_CANARY_HEALTH Service health test bad Critical The health test result for ZOOKEEPER_CANARY_HEALTH has become bad: Canary test failed to establish a connection or a client session to the ZooKeeper service.
Role: JournalNode <CLUSTER_NAME>
ZOOKEEPER_SERVER_QUORUM_MEMBERSHIP Role health test bad Critical The health test result for ZOOKEEPER_SERVER_QUORUM_MEMBERSHIP has become bad: Quorum membership status could not be detected for the last 3 minute(s). At the last connection attempt the ZooKeeper server was in leader election.
2. Opening the Hue Web UI can bring up a blank page and occasionally the Hue Server (Home > Hue > Instances > Hue Server) shows bad status, with the following error:
Test of whether this role's web server is responding to requests for metrics.
Bad : The Cloudera Manager Agent is not able to communicate with this role's web server.
3. On the node with Hue Server role, Node 4 by default, /var/log/hue/error.log shows errors like below:
file_reporter ERROR failed to write metrics to file
Traceback (most recent call last):
File "/opt/cloudera/parcels/CDH-5.12.1-1.cdh5.12.1.p0.3/lib/hue/build/env/lib/python2.6/site-packages/MySQL_python-1.2.5-py2.6-linux-x86_64.egg/MySQLdb/connections.py", line 193, in __init__
super(Connection, self).__init__(*args, **kwargs2)
OperationalError: (1040, 'Too many connections')
4. Checking the MySQL master status intermittently fails with:
To view full details, sign in with your My Oracle Support account.
Don't have a My Oracle Support account? Click to get started!
In this Document