Cloudera Manager Randomly Generates Alerts Even to the Point of Bringing Hue and Other Services into "Bad" Health

(Doc ID 2241305.1)

Last updated on JANUARY 22, 2018

Applies to:

Big Data Appliance Integrated Software - Version 4.5.0 and later
Linux x86-64

Symptoms

Cloudera Manager (CM) randomly generates multiple alerts. It can get to the point that Hue and other services get into "Bad" health.

1. Alerts can look like:

Role: Hue Server (bdanode04)
HUE_SERVER_WEB_METRIC_COLLECTION Role health test bad Critical The health test result for HUE_SERVER_WEB_METRIC_COLLECTION has become bad: The Cloudera Manager Agent is not able to communicate with this role's web server.

Role: JournalNode (bdanode03)
JOURNAL_NODE_WEB_METRIC_COLLECTION Role health test bad Critical The health test result for JOURNAL_NODE_WEB_METRIC_COLLECTION has become bad: The Cloudera Manager Agent got an unexpected response from this role's web server.

Cluster: <cluster_name>
HUE_HUE_SERVERS_HEALTHY Service health test bad Critical The health test result for HUE_HUE_SERVERS_HEALTHY has become bad: Healthy Hue Server: 0. Concerning Hue Server: 0. Total Hue Server: 1. Percent healthy: 0.00%. Percent healthy or concerning: 0.00%. Critical threshold: 51.00%.

Cluster: <cluster_name>
HDFS_DATA_NODES_HEALTHY Service health test bad Critical The health test result for HDFS_DATA_NODES_HEALTHY has become bad: Healthy DataNode: 0. Concerning DataNode: 0. Total DataNode: 18. Percent healthy: 0.00%. Percent healthy or concerning: 0.00%. Critical threshold: 90.00%.

Cluster: <cluster_name>
ZOOKEEPER_CANARY_HEALTH Service health test bad Critical The health test result for ZOOKEEPER_CANARY_HEALTH has become bad: Canary test failed to establish a connection or a client session to the ZooKeeper service.

Role: JournalNode <cluster_name>
ZOOKEEPER_SERVER_QUORUM_MEMBERSHIP Role health test bad Critical The health test result for ZOOKEEPER_SERVER_QUORUM_MEMBERSHIP has become bad: Quorum membership status could not be detected for the last 3 minute(s). At the last connection attempt the ZooKeeper server was in leader election.

 

2. Opening the Hue Web UI can bring up a blank page and occasionally the Hue Server (Home > Hue  > Instances > Hue Server) shows bad status, with the following error:

Web Server Status ( <cluster_name> hue Hue Server bdanode04.domain.com ) February 23, 2017, 3:31 PM EST
Test of whether this role's web server is responding to requests for metrics.
Bad : The Cloudera Manager Agent is not able to communicate with this role's web server.

3. On the node with Hue Server role, Node 4 by default, /var/log/hue/error.log shows errors like below:

[XX/XXX/XXXX XX:XX:XX +XXXX] fsmanager ERROR Failed to get filesystem called "XX" for "XX" schema: Filesystem not configured for XX
[XX/XXX/XXXX XX:XX:XX +XXXX] fsmanager ERROR Failed to get filesystem called "XX" for "XX" schema: Filesystem not configured for XX
[XX/XXX/XXXX XX:XX:XX +XXXX] file_reporter ERROR failed to write metrics to file
Traceback (most recent call last):
...
File "/opt/cloudera/parcels/CDH-5.12.1-1.cdh5.12.1.p0.3/lib/hue/build/env/lib/python2.6/site-packages/MySQL_python-1.2.5-py2.6-linux-x86_64.egg/MySQLdb/connections.py", line 193, in __init__
super(Connection, self).__init__(*args, **kwargs2)
OperationalError: (1040, 'Too many connections')
[XX/XXX/XXXX XX:XX:XX +XXXX] fsmanager ERROR Failed to get filesystem called "XX" for "XX" schema: Filesystem not configured for XX 

4. Checking the MySQL master status intermittently fails with: 

 

Cause

Sign In with your My Oracle Support account

Don't have a My Oracle Support account? Click to get started

My Oracle Support provides customers with access to over a
Million Knowledge Articles and hundreds of Community platforms