On Oracle Big Data Appliance BDR Hive Jobs Fail After Cluster Stop And Start (Doc ID 2052548.1)

Last updated on NOVEMBER 08, 2022

Applies to:

Big Data Appliance Integrated Software - Version 3.1.0 to 4.1.0 [Release 3.1 to 4.1]
Linux x86-64

Symptoms

Production and Disaster Recovery Oracle Big Data Appliances (BDAs) are stopped and started for a RAM upgrade.

But after restart, BDR Hive jobs scheduled on Disaster Recovery cluster are failing.

The message on Cloudera manager (Replication section) for those job's executions is:

"The remote replication task was not found, probably due to a server restart"

Tried Cloudera Manager Server restart on node03 but still the same issue.

In Cloudera Manager Server log files, there are many messages like

Replication result file '/var/lib/cloudera-scm-server/commands/nnnn/summarynnnnnnnnnnnnnnnnnnn.json' for command 'nnn' is missing.

Tying to manually start one of replication job fails with

2015-08-28 15:00:00,022 ERROR [com.cloudera.cmf.scheduler-1_Worker-1:scheduler.CommandDispatcherJob@<JOBID>]
Skipping schedule execution since the state update failed due to an unexpected error. Schedule details: DbCommandSchedule{id=1, commandName=GlobalPoolsRefresh}
javax.persistence.RollbackException: Error while committing the transaction
  at org.hibernate.ejb.TransactionImpl.commit(TransactionImpl.java:92)
  at com.cloudera.enterprise.AbstractWrappedEntityManager.commit(AbstractWrappedEntityManager.java:110)
  at com.cloudera.cmf.persist.CmfEntityManager.commit(CmfEntityManager.java:366)
  at com.cloudera.cmf.scheduler.CommandDispatcherJob.execute(CommandDispatcherJob.java:135)
  at org.quartz.core.JobRunShell.run(JobRunShell.java:206)
  at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:548)
Caused by: javax.persistence.OptimisticLockException: org.hibernate.StaleObjectStateException:
Row was updated or deleted by another transaction (or unsaved-value mapping was incorrect): [com.cloudera.cmf.model.DbCommandSchedule#1]
  at org.hibernate.ejb.AbstractEntityManagerImpl.wrapStaleStateException(AbstractEntityManagerImpl.java:1416)

Cause

	To view full details, sign in with your My Oracle Support account.
	Don't have a My Oracle Support account? Click to get started!

In this Document

Symptoms

Cause

Solution

My Oracle Support provides customers with access to over a million knowledge articles and a vibrant support community of peers and Oracle experts.

On Oracle Big Data Appliance BDR Hive Jobs Fail After Cluster Stop And Start (Doc ID 2052548.1)

Applies to:

Symptoms

Cause

To view full details, sign in with your My Oracle Support account.

Don't have a My Oracle Support account? Click to get started!