Let us review some of the important properties of Spark.
- Like Map Reduce, Spark will create containers to process data.
- These containers are called as Executors
- When it comes to Map Reduce – Map Tasks are based on number of blocks of underlying file, where as with Spark it will create containers based up on allocations configured.
- There are 2 types of allocation – static and dynamic.
- In Plain Vanilla Spark, by default allocation is done using static.
- As part of Cloudera Distribution, allocation is dynamic.
- Let us review the properties related to executors as well as allocation for the Spark Applications. Using command prompt we can check for spark-env.sh and spark-defaults.conf under /etc/spark/conf
- spark-env.sh is shell script to set environment variables where as spark-defaults.conf is properties file which control the run time behavior of Spark Jobs.
- Unlike Hadoop configuration files Spark configuration files are not xml files, they are standard properties files where properties are defined as key value pairs.
- Key and Value are separated by “=”.
- Memory Settings are primarily under spark-env.sh.