Spark Modules

In the earlier versions of Spark, we have core API at the bottom and all the higher level modules work with core API. Examples of core API are a map, reduce, join, groupByKey etc. But with Spark 2, Data Frames and Spark SQL has become the core module.

  • Core – Transformations and Actions — APIs such as map, reduce, join, filter etc. They typically work on RDD
  • Spark SQL and Data Frames – APIs and Spark SQL interface for batch processing on top of Data Frames or Data Sets (not available for Python)
  • Structured Streaming – APIs and Spark SQL interface for stream data processing on top of Data Frames
  • Machine Learning Pipelines – Machine Learning data pipelines to apply Machine Learning algorithms on top of Data Frames
  • GraphX Pipelines
  • We can build applications using different programming languages such as Scala, Python, Java, R etc leveraging Spark APIs of the above-mentioned modules.

Share this post