As part of this workshop we will be focusing on Spark and Kafka using Scala as programming language.
- Getting Started
- Fundamentals of Programming – Using Scala
- Big Data ecosystem – Overview
- Apache Spark 2 – Architecture and Core APIs
- Apache Spark 2 – Data Frames and Spark SQL
- Apache Spark 2 – Building Streaming Pipelines
Getting Started
As the course from Data Engineering Perspective Data processing skills are very important. Even today SQL is the most popular way of processing the data. Hence we will start with SQL and get Big Data eco system overview as part of this session.
- Setting up the Environment
- Revision of SQL
Fundamentals of Programming – Using Scala
As part of this module we will learn basics of programming using Scala as Programming Language.
- Overview of Scala REPL
- Declaring Variables
- Functions and Operators
- User Defined Functions
- Object Oriented Concepts – Overview
- Collections and Tuples
- Development Life Cycle
- and more
Apache Spark 2 – Architecture and Core APIs
As part of this module we will go through Core APIs of Spark.
- Apache Spark Official Documentation
- Creating Resilient Distributed Data Sets
- Data Processing using Transformations and Actions
- Understanding Execution Life Cycle
- and more
Apache Spark 2 – Data Frames and Spark SQL
Data Frames and Spark SQL have become core module of Apache Spark. Most of the new applications are developed using Data Frames and Spark SQL.
- Creating Data Frames from Files and Databases
- Pre-Defined Functions
- Basic Transformations using Data Frame APIs
- Windowing Functions using Data Frame APIs
- Basic Transformations using Spark SQL
- Windowing Functions using Spark SQL
Apache Spark 2 – Building Streaming Pipelines
As part of this module we will see how to build streaming pipelines using Kafka and Spark Structured Streaming.
- Getting Started with Kafka
- Overview of Kafka Producer and Consumer APIs.
- Getting Started with Spark Structured Streaming
- End to End Streaming Pipeline using Kafka Connect, Kafka and Spark Structured Streaming.