My Oracle Support Banner

Cloudera Manager Randomly Generates Alerts Even to the Point of Bringing Hue and Other Services into "Bad" Health (Doc ID 2241305.1)

Last updated on JANUARY 22, 2020

Applies to:

Big Data Appliance Integrated Software - Version 4.5.0 and later
Linux x86-64

Symptoms

NOTE: In the examples that follow, user details, cluster names, hostnames, directory paths, filenames, etc. represent a fictitious sample (and are used to provide an illustrative example only). Any similarity to actual persons, or entities, living or dead, is purely coincidental and not intended in any manner.

  

Cloudera Manager (CM) randomly generates multiple alerts. It can get to the point that Hue and other services get into "Bad" health.

1. Alerts can look like:

Role: Hue Server (bdanode04)
HUE_SERVER_WEB_METRIC_COLLECTION Role health test bad Critical The health test result for HUE_SERVER_WEB_METRIC_COLLECTION has become bad: The Cloudera Manager Agent is not able to communicate with this role's web server.

Role: JournalNode (bdanode03)
JOURNAL_NODE_WEB_METRIC_COLLECTION Role health test bad Critical The health test result for JOURNAL_NODE_WEB_METRIC_COLLECTION has become bad: The Cloudera Manager Agent got an unexpected response from this role's web server.

Cluster: <CLUSTER_NAME>
HUE_HUE_SERVERS_HEALTHY Service health test bad Critical The health test result for HUE_HUE_SERVERS_HEALTHY has become bad: Healthy Hue Server: 0. Concerning Hue Server: 0. Total Hue Server: 1. Percent healthy: 0.00%. Percent healthy or concerning: 0.00%. Critical threshold: 51.00%.

Cluster: <CLUSTER_NAME>
HDFS_DATA_NODES_HEALTHY Service health test bad Critical The health test result for HDFS_DATA_NODES_HEALTHY has become bad: Healthy DataNode: 0. Concerning DataNode: 0. Total DataNode: 18. Percent healthy: 0.00%. Percent healthy or concerning: 0.00%. Critical threshold: 90.00%.

Cluster: <CLUSTER_NAME>
ZOOKEEPER_CANARY_HEALTH Service health test bad Critical The health test result for ZOOKEEPER_CANARY_HEALTH has become bad: Canary test failed to establish a connection or a client session to the ZooKeeper service.

Role: JournalNode <CLUSTER_NAME>
ZOOKEEPER_SERVER_QUORUM_MEMBERSHIP Role health test bad Critical The health test result for ZOOKEEPER_SERVER_QUORUM_MEMBERSHIP has become bad: Quorum membership status could not be detected for the last 3 minute(s). At the last connection attempt the ZooKeeper server was in leader election.

 

2. Opening the Hue Web UI can bring up a blank page and occasionally the Hue Server (Home > Hue  > Instances > Hue Server) shows bad status, with the following error:

Web Server Status ( <CLUSTER_NAME> hue Hue Server bdanode04.example.com ) February 23, 2017, 3:31 PM EST
Test of whether this role's web server is responding to requests for metrics.
Bad : The Cloudera Manager Agent is not able to communicate with this role's web server.

3. On the node with Hue Server role, Node 4 by default, /var/log/hue/error.log shows errors like below:

fsmanager ERROR Failed to get filesystem called "<VALUE>" for "<VALUE>" schema: Filesystem not configured for <VALUE>
file_reporter ERROR failed to write metrics to file
Traceback (most recent call last):
...
File "/opt/cloudera/parcels/CDH-5.12.1-1.cdh5.12.1.p0.3/lib/hue/build/env/lib/python2.6/site-packages/MySQL_python-1.2.5-py2.6-linux-x86_64.egg/MySQLdb/connections.py", line 193, in __init__
super(Connection, self).__init__(*args, **kwargs2)
OperationalError: (1040, 'Too many connections')

4. Checking the MySQL master status intermittently fails with: 

 

Cause

To view full details, sign in with your My Oracle Support account.

Don't have a My Oracle Support account? Click to get started!


In this Document
Symptoms
Cause
Solution


My Oracle Support provides customers with access to over a million knowledge articles and a vibrant support community of peers and Oracle experts.