Running a Spark Job From a Non-BDA Edge Node Fails With: "ERROR shuffle.RetryingBlockFetcher: Exception while beginning fetch of 1 outstanding blocks"
(Doc ID 2498643.1)
Last updated on JULY 03, 2022
Applies to:Big Data Appliance Integrated Software - Version 4.7.0 and later
When a spark2 job is run from an edge node that is not on the BDA, it fails with an error like below:
java.io.IOException: Failed to connect to bdahostname.<DOMAIN>/<IP Address of BDA node>:<port number>
If the same job is run entirely on the BDA cluster (without the edge node), it works. Additionally if the spark2 job is run from the edge node using the "--deploy-mode cluster" option, (i.e. spark2-submit --class org.apache.spark.examples.SparkPi --master yarn --deploy-mode cluster /opt/cloudera/parcels/CDH/lib/spark//lib/<your sparkapplication>), it runs successfully.
To view full details, sign in with your My Oracle Support account.
Don't have a My Oracle Support account? Click to get started!
In this Document
|Steps to Setup the iptables Rules|
|Steps to Revert the iptables Rules|