Yarn Frequently Asked Questions (Doc ID 1910068.1)

Last updated on NOVEMBER 08, 2022

Applies to:

Big Data Appliance Integrated Software - Version 2.5.0 and later
Linux x86-64

Purpose

This document provides answers to frequently asked questions about Yarn installed on Oracle Big Data Appliance (BDA).

Questions and Answers

	To view full details, sign in with your My Oracle Support account.
	Don't have a My Oracle Support account? Click to get started!

In this Document

Questions and Answers

Will I need to make changes to my code or scripts when using Yarn instead of MapReduce?

Is there anyway to calculate how many containers (2 MB / 1 CPU) that particular user might consume when we migrate to CDH 5 / YARN?

My understanding is that in MR2, one can determine how many concurrent tasks are launched per node by dividing the resources allocated to YARN by the resources allocated to each MapReduce task, and taking the minimum of the two types of resources (memory and CPU). I read somewhere saying BDA 3.0 / CDH 5.0 does not support CPU allocation yet, it supports only memory allocation. So the concurrent tasks per node = yarn.nodemanager.resource.memory-mb divided by mapreduce.[map|reduce].memory.mb. Can you confirm it?

Does it mean, it will use both memory and cpu in the calculation if we use FIFO scheduler; but use memory only if we use Fair scheduler? Since we are using Fair scheduler, then I assume cores will be ignored in the calculation. Is that right?

The fair scheduler allocation is different in CDH 5.0.1. We can't migrate automatically the allocation file from cdh 4 to cdh 5. It has to be done manually. How can we set up the same in CDH 5 as we had in CDH 4?

Do Hue users need to have an account on all BDA nodes or on the Resource Manager (RM) nodes to avoid getting "Failed to run job : Error assigning app to queue default" when running a job?

How can we limit the number of apps/jobs a user can run at a time? In other words, for users who submit jobs to the cluster (which don't have a preexisting queue/pool) to limit the total number of jobs they can submit to the cluster altogether (Essentially setting the equivalent of userMaxJobsDefault set to 2 as in CDH4).

We set the yarn.nodemanager.resource.memory-mb in CM, but the job.xml file is still showing the default 8GB. Why isn't this setting taking place?

For some hive queries, we got heap size error "org.apache.hadoop.mapred.YarnChild: Error running child : java.lang.OutOfMemoryError: Java heap space" should we increase: ApplicationMaster Java Maximum Heap Size, Map Task Maximum Heap Size, and Reduce Task Maximum Heap Size default value on the BDA of 787.69 MB?

What is this "Java Heap Size of NodeManager" and how is it related to Container memory given by yarn.nodemanager.resource.memory-mb?

Is there any way to exclude one rack completely (18 nodes) for just one user? Can I use yarn.resourcemanager.nodes.exclude-path parameter somehow on the server where the user is running the job so we can exclude the rack 4 nodes from consideration?

My Oracle Support provides customers with access to over a million knowledge articles and a vibrant support community of peers and Oracle experts.