Cloudera Manager Randomly Generates Alerts Even to the Point of Bringing Hue and Other Services into "Bad" Health
(Doc ID 2241305.1)
Last updated on AUGUST 17, 2022
Applies to:
Big Data Appliance Integrated Software - Version 4.5.0 and laterLinux x86-64
Symptoms
Cloudera Manager (CM) randomly generates multiple alerts. It can get to the point that Hue and other services get into "Bad" health.
1. Alerts can look like:
Role: Hue Server (bdanode04)
HUE_SERVER_WEB_METRIC_COLLECTION Role health test bad Critical The health test result for HUE_SERVER_WEB_METRIC_COLLECTION has become bad: The Cloudera Manager Agent is not able to communicate with this role's web server.
Role: JournalNode (bdanode03)
JOURNAL_NODE_WEB_METRIC_COLLECTION Role health test bad Critical The health test result for JOURNAL_NODE_WEB_METRIC_COLLECTION has become bad: The Cloudera Manager Agent got an unexpected response from this role's web server.
Cluster: <CLUSTER_NAME>
HUE_HUE_SERVERS_HEALTHY Service health test bad Critical The health test result for HUE_HUE_SERVERS_HEALTHY has become bad: Healthy Hue Server: 0. Concerning Hue Server: 0. Total Hue Server: 1. Percent healthy: 0.00%. Percent healthy or concerning: 0.00%. Critical threshold: 51.00%.
Cluster: <CLUSTER_NAME>
HDFS_DATA_NODES_HEALTHY Service health test bad Critical The health test result for HDFS_DATA_NODES_HEALTHY has become bad: Healthy DataNode: 0. Concerning DataNode: 0. Total DataNode: 18. Percent healthy: 0.00%. Percent healthy or concerning: 0.00%. Critical threshold: 90.00%.
Cluster: <CLUSTER_NAME>
ZOOKEEPER_CANARY_HEALTH Service health test bad Critical The health test result for ZOOKEEPER_CANARY_HEALTH has become bad: Canary test failed to establish a connection or a client session to the ZooKeeper service.
Role: JournalNode <CLUSTER_NAME>
ZOOKEEPER_SERVER_QUORUM_MEMBERSHIP Role health test bad Critical The health test result for ZOOKEEPER_SERVER_QUORUM_MEMBERSHIP has become bad: Quorum membership status could not be detected for the last 3 minute(s). At the last connection attempt the ZooKeeper server was in leader election.
2. Opening the Hue Web UI can bring up a blank page and occasionally the Hue Server (Home > Hue > Instances > Hue Server) shows bad status, with the following error:
Test of whether this role's web server is responding to requests for metrics.
Bad : The Cloudera Manager Agent is not able to communicate with this role's web server.
3. On the node with Hue Server role, Node 4 by default, /var/log/hue/error.log shows errors like below:
file_reporter ERROR failed to write metrics to file
Traceback (most recent call last):
...
File "/opt/cloudera/parcels/CDH-5.12.1-1.cdh5.12.1.p0.3/lib/hue/build/env/lib/python2.6/site-packages/MySQL_python-1.2.5-py2.6-linux-x86_64.egg/MySQLdb/connections.py", line 193, in __init__
super(Connection, self).__init__(*args, **kwargs2)
OperationalError: (1040, 'Too many connections')
4. Checking the MySQL master status intermittently fails with:
Cause
To view full details, sign in with your My Oracle Support account. |
|
Don't have a My Oracle Support account? Click to get started! |
In this Document
Symptoms |
Cause |
Solution |