Review Important Properties

Let us review some of the important properties of Spark.

  • Like Map Reduce, Spark will create containers to process data.
  • These containers are called as Executors
  • When it comes to Map Reduce – Map Tasks are based on number of blocks of underlying file, where as with Spark it will create containers based up on allocations configured.
  • There are 2 types of allocation – static and dynamic.
  • In Plain Vanilla Spark, by default allocation is done using static.
  • As part of Cloudera Distribution, allocation is dynamic.
  • Let us review the properties related to executors as well as allocation for the Spark Applications. Using command prompt we can check for spark-env.sh and spark-defaults.conf under /etc/spark/conf
  • spark-env.sh is shell script to set environment variables where as spark-defaults.conf is properties file which control the run time behavior of Spark Jobs.
  • Unlike Hadoop configuration files Spark configuration files are not xml files, they are standard properties files where properties are defined as key value pairs.
  • Key and Value are separated by “=”.
  • Memory Settings are primarily under spark-env.sh.

 

Share this post