My Oracle Support Banner

Formula to Calculate Threshold Percentage Passed to Balancer for Efficiently Balancing the HDFS Cluster (Doc ID 1588427.1)

Last updated on DECEMBER 08, 2017

Applies to:

Big Data Appliance Integrated Software - Version 2.0.1 and later
Linux x86-64

Purpose

HDFS data might not always be be placed uniformly across the DataNode. One common reason is addition of new DataNodes to an existing cluster. While placing new blocks (data for a file is stored as a series of blocks), NameNode considers various parameters before choosing the DataNodes to receive these blocks. Some of the considerations are:

Due to multiple competing considerations, data might not be uniformly placed across the DataNodes. HDFS provides a tool called balancer for administrators that analyzes block placement and rebalance the data across the DataNode

This document explains the formula used by the Balancer to balance the data on Hadoop Distributed File System (HDFS). This will assist in choosing the threshold value to efficiently balance the HDFS cluster. The threshold parameter is a fraction in the range of (0%, 100%) with a default value of 10%. 

Scope

 HDFS Administrator

Details

To view full details, sign in with your My Oracle Support account.

Don't have a My Oracle Support account? Click to get started!


In this Document
Purpose
Scope
Details
References

My Oracle Support provides customers with access to over a million knowledge articles and a vibrant support community of peers and Oracle experts.