Durga Gadiraju

Big Data Hadoop and Spark Training – Corporate Exclusive – 201804

Current Status

Not Enrolled

Price

Closed

Get Started

This course is currently closed

Kick Off

Before starting the kick off session we have already taken care of lab access to the attendees. Following are the topics covered as part of the kick off session

Check the Zoom connectivity and troubleshoot the challenges
Demonstrate how to complete the sign up process to the labs
Instruction to setup development environment using IntelliJ for Scala and Spark

Here is the video archive for the session

Here are the instructions to setup development environment using IntelliJ for Scala and Spark.

Setup Spark Development Environment – IntelliJ and Scala

Tentative Schedule for the course

Week 1 – Basics of Programming using Scala
- Basic programming constructs
- Functions and Anonymous Functions
- Overview of Object Oriented Concepts
- Scala Collections API
- Basic I/O operations
- Development life cycle
Week 2 – Introduction to Big Data and Sqoop
- Introduction to Big Data
- Getting Started with Sqoop
- Sqoop Import
- Sqoop Export
Week 3 – Spark Overview, RDD Operations
- Architecture of Spark
- Reading data into RDDs and writing back to HDFS
- Row level transformations
- Aggregations
- Joining RDDs
- Sorting and Ranking RDDs
Week 4 – Spark SQL and Hive
- Introduction to query engines – Hive, Tez, Impala and
  Spark SQL
- Create tables in Hive
- Load data to Hive tables
- Basic Hive Queries – Projection, Filtering, Aggregations, Joins etc
- Advanced Hive Queries – Analytic and Windowing Functions
Week 5 – Data Frame Operations
- Data Frames and Data Frame operations
- Running Hive queries from Spark
- Different file formats
- Compressing data using different file formats
Week 6 – Structured Streaming pipelines
- Basics of HBase
- Basics of Kafka
- Integrating Kafka with Spark Streaming
- Building data pipeline with Kafka, Spark Streaming and
  HBase