Review Important Properties

Let us review some of the important properties of Spark.

Like Map Reduce, Spark will create containers to process data.
These containers are called as Executors
When it comes to Map Reduce – Map Tasks are based on number of blocks of underlying file, where as with Spark it will create containers based up on allocations configured.
There are 2 types of allocation – static and dynamic.
In Plain Vanilla Spark, by default allocation is done using static.
As part of Cloudera Distribution, allocation is dynamic.
Let us review the properties related to executors as well as allocation for the Spark Applications. Using command prompt we can check for spark-env.sh and spark-defaults.conf under /etc/spark/conf
spark-env.sh is shell script to set environment variables where as spark-defaults.conf is properties file which control the run time behavior of Spark Jobs.
Unlike Hadoop configuration files Spark configuration files are not xml files, they are standard properties files where properties are defined as key value pairs.
Key and Value are separated by “=”.
Memory Settings are primarily under spark-env.sh.

Join Our Community