Hadoop Frequently Asked Questions (FAQ)
(Doc ID 1530797.1)
Last updated on MAY 24, 2021
Applies to:
Big Data Appliance Integrated Software - Version 2.0.1 and laterLinux x86-64
Purpose
This document provides answers to frequently asked questions about Hadoop distributed by Cloudera for use on the Oracle Big Data Appliance(BDA).
Questions and Answers
To view full details, sign in with your My Oracle Support account. |
|
Don't have a My Oracle Support account? Click to get started! |
In this Document
Purpose |
Questions and Answers |
Is the environment variable $HADOOP_HOME used in CDH 4.1.2 ? |
In lieu of the environment variable $HADOOP_HOME what should be used in CDH 4.1.2 ? |
Should OS disks (/dev/sda, /dev/sdb) be used to store local data? HDFS data? |
How can data on the OS disks be cleaned up, since storing it there is not recommended? |
Does the Cloudera CDH Client have to be installed on all Exadata DB nodes? |
If a disk goes bad and is replaced can you verify the disk is functional with regards to HDFS? |
If one of the services managed by Cloudera Manager(CM) goes into "BAD" health, is there a recommended order for checking the status of services? |
If the nodes of the BDA cluster have been up for close to 200 days is a reboot recommended? |
Can you decommision non-critical nodes from BDA HDFS cluster , inorder to install NoSQL ? |
For HA testing is it possible to relocate Hive services to a different node after a Hive node failure? |
What options are available for migrating service roles on the BDA? |
What are the options for destroying i.e. performing a non-recoverable delete all the data stored on the DataNodes in HDFS? |
Running a very long reducer seems to be filling one DataNode. Why would that be? |
Why are zookeeper, hdfs, mapred, yarn, hive, sqoop, users in /etc/passwd? |
Is it possible to limit memory and CPU consumption to different BDA processes to not exceed a specific set threshold? |
Is HDFS Encryption and Navigator Key Trustee of the Cloudera stack on the BDA 4.1 with CDH 5.3.0 supported? |
If implementing a script to do replication of hdfs and hive using the Cloudera API is it possible to use the current timezone for timing the replication schedule? |
Is there a property, or method, to copy data from a local file system to hdfs in parallel to speed data copies up? |
Where can install information be found for Cloudera Data Science Workbench? |
References |