My Oracle Support Banner

On Oracle Big Data Appliance with BDS enabled, Server Crash / Hang Noticed when Executing a Map Job (Doc ID 2014635.1)

Last updated on DECEMBER 04, 2019

Applies to:

Big Data Appliance Integrated Software - Version 4.1.0 and later
Linux x86-64

Symptoms

NOTE: In the examples that follow, user details, cluster names, hostnames, directory paths, filenames, etc. represent a fictitious sample (and are used to provide an illustrative example only). Any similarity to actual persons, or entities, living or dead, is purely coincidental and not intended in any manner.

On Oracle Big Data Appliance (BDA) executing MR job throws below errors. Oracle Big Data SQL is enabled on BDA and thus cgroups is turned on. Also high CPU usage is noticed on Resource Manager nodes leading to crashing of RM nodes.

15/05/18 19:10:09 INFO mapreduce.Job: map 8% reduce 0%
15/05/18 19:26:09 INFO mapreduce.Job: Task Id : attempt_1431987421855_0002_m_000207_0, Status : FAILED
Error: java.io.IOException: Failing write. Tried pipeline recovery 5 times without success.
  at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:939)
  at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:496)
15/05/18 19:28:22 INFO mapreduce.Job: map 9% reduce 0%
15/05/18 19:29:32 INFO mapreduce.Job: Task Id : attempt_1431987421855_0002_m_000314_0, Status : FAILED
Error: java.io.IOException: Failing write. Tried pipeline recovery 5 times without success.
  at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:939)
  at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:496)
............................

15/05/18 19:44:12 INFO mapreduce.Job: Task Id : attempt_1431987421855_0002_m_000295_0, Status : FAILED
Error: java.io.IOException: Failing write. Tried pipeline recovery 5 times without success. at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:939)
  at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:496)
................


Increased below Yarn service memory/heap parameters in Cloudera Manager which solved high CPU usage.

mapreduce.map.java.opts.max.heap
mapreduce.map.memory.mb
yarn.nodemanager.resource.memory-mb

For details about Yarn Configuration memory settings, see the Cloudera documentation on "Managing YARN".

But after above changes while executing the Map job, the BDA cluster became unstable. I.e Couple of nodes went into kernel panic mode and on some nodes Ethernet / Admin network went down.

Cause

To view full details, sign in with your My Oracle Support account.

Don't have a My Oracle Support account? Click to get started!


In this Document
Symptoms
Cause
Solution


My Oracle Support provides customers with access to over a million knowledge articles and a vibrant support community of peers and Oracle experts.