Job Errors During Restart of Cluster Services Report "File /opt/cloudera must be owned by root, but is owned by 495"

(Doc ID 2354913.1)

Last updated on JANUARY 30, 2018

Applies to:

Big Data Appliance Integrated Software - Version 4.1.0 and later
Linux x86-64


While executing a long running job, job errors are encountered during the restart of cluster services.

The following error could be seen on most containers when cluster restart took place:

<timestamp> INFO mapreduce.Job: Task Id : attempt_xxxx, Status : FAILED
Container exited with a non-zero exit code 154
Exception when trying to cleanup container container_xxxx: Problem signalling
container xxxx with SIGTERM; output: and exitCode: 24
at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.signalContainer(
at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(
at org.apache.hadoop.yarn.event.AsyncDispatcher$
Caused by: ExitCodeException exitCode=24: File /opt/cloudera must be owned by root, but is owned by 495

The user 495 appears to be cloudera-scm.

The job running is temporarily impacted although it finishes after the new Resource Manager come online.

Note that while the symptoms are very similar to that reported in: BDA V4.1 Node Reprovision Fails at Step 10 StartHadoopServices Due to Failed NodeManager with: /opt/cloudera must be owned by root, but is owned by 494 (Doc ID 1987282.1) the underlying cause is not the same.


Sign In with your My Oracle Support account

Don't have a My Oracle Support account? Click to get started

My Oracle Support provides customers with access to over a
Million Knowledge Articles and hundreds of Community platforms