Current Status
Not Enrolled
Price
Closed
Get Started
This course is currently closed
Kick Off
Before starting the kick off session we have already taken care of lab access to the attendees. Following are the topics covered as part of the kick off session
- Check the Zoom connectivity and troubleshoot the challenges
- Demonstrate how to complete the sign up process to the labs
- Instruction to setup development environment using IntelliJ for Scala and Spark
Here is the video archive for the session
Here are the instructions to setup development environment using IntelliJ for Scala and Spark.
Tentative Schedule for the course
- Week 1 – Basics of Programming using Scala
- Basic programming constructs
- Functions and Anonymous Functions
- Overview of Object Oriented Concepts
- Scala Collections API
- Basic I/O operations
- Development life cycle
- Week 2 – Introduction to Big Data and Sqoop
- Introduction to Big Data
- Getting Started with Sqoop
- Sqoop Import
- Sqoop Export
- Week 3 – Spark Overview, RDD Operations
- Architecture of Spark
- Reading data into RDDs and writing back to HDFS
- Row level transformations
- Aggregations
- Joining RDDs
- Sorting and Ranking RDDs
- Week 4 – Spark SQL and Hive
- Introduction to query engines – Hive, Tez, Impala and
Spark SQL - Create tables in Hive
- Load data to Hive tables
- Basic Hive Queries – Projection, Filtering, Aggregations, Joins etc
- Advanced Hive Queries – Analytic and Windowing Functions
- Introduction to query engines – Hive, Tez, Impala and
- Week 5 – Data Frame Operations
- Data Frames and Data Frame operations
- Running Hive queries from Spark
- Different file formats
- Compressing data using different file formats
- Week 6 – Structured Streaming pipelines
- Basics of HBase
- Basics of Kafka
- Integrating Kafka with Spark Streaming
- Building data pipeline with Kafka, Spark Streaming and
HBase
Course Content
Expand All
Lesson Content
0% Complete
0/8 Steps