Node(s) Loose Connection with CM with Agents Reporting: "This host is in Contact with Cloudera Manager Server. This host is not in contact with Host Monitor"

(Doc ID 2363310.1)

Last updated on FEBRUARY 25, 2018

Applies to:

Big Data Appliance Integrated Software - Version 4.0 and later
Linux x86-64

Symptoms

The reported symptoms are:

1. One or more nodes loose connection with Cloudera Manager(CM).

2. For those nodes loosing connection with CM the hosts are repeatedly reported as having 'good' and then 'bad' health.

3. For the host(s) in this state, CM reports the Agent Status as:

This host is in Contact with Cloudera Manager Server. This host is not in contact with Host Monitor

4. Checking associated cloudera-scm-agent log, /var/log/cloudera-scm-agent/cloudera-scm-agent.log shows

a) That the CM Agent is not able to get the details of the process directory.

[<timestamp>] xxxx Monitor-HostMonitor throttling_logger ERROR (9 skipped) Could not find local file system for /var/run/cloudera-scm-agent/process".

b) That the CM Agent is not able to fetch metrics:

[<timestamp>] xxxx Monitor-GenericMonitor throttling_logger ERROR (10 skipped) Error fetching metrics at 'http://bdanode0x.example.com:50070/jmx'
Traceback (most recent call last):
File "/usr/lib64/cmf/agent/build/env/lib/python2.6/site-packages/cmf-5.9.0-py2.6.egg/cmf/monitor/generic/metric_collectors.py", line 200, in _collect_and_parse_and_returnself._adapter.safety_valve)) 

5. Restarting the cloudera-scm-agent on the host does not resolve the error.

Cause

Sign In with your My Oracle Support account

Don't have a My Oracle Support account? Click to get started

My Oracle Support provides customers with access to over a
Million Knowledge Articles and hundreds of Community platforms