In the earlier versions of Spark, we have core API at the bottom and all the higher level modules work with core API. Examples of core API are a map, reduce, join, groupByKey etc. But with Spark 2, Data Frames and Spark SQL has become the core module.
- Core – Transformations and Actions — APIs such as map, reduce, join, filter etc. They typically work on RDD
- Spark SQL and Data Frames – APIs and Spark SQL interface for batch processing on top of Data Frames or Data Sets (not available for Python)
- Structured Streaming – APIs and Spark SQL interface for stream data processing on top of Data Frames
- Machine Learning Pipelines – Machine Learning data pipelines to apply Machine Learning algorithms on top of Data Frames
- GraphX Pipelines
- We can build applications using different programming languages such as Scala, Python, Java, R etc leveraging Spark APIs of the above-mentioned modules.