Big Data Hadoop and Spark Training – Corporate Exclusive – 201804

Current Status
Not Enrolled
Price
Closed
Get Started
This course is currently closed

Kick Off

Before starting the kick off session we have already taken care of lab access to the attendees. Following are the topics covered as part of the kick off session

  • Check the Zoom connectivity and troubleshoot the challenges
  • Demonstrate how to complete the sign up process to the labs
  • Instruction to setup development environment using IntelliJ for Scala and Spark

Here is the video archive for the session

Here are the instructions to setup development environment using IntelliJ for Scala and Spark.

Setup Spark Development Environment – IntelliJ and Scala

Tentative Schedule for the course

  • Week 1 – Basics of Programming using Scala
    • Basic programming constructs
    • Functions and Anonymous Functions
    • Overview of Object Oriented Concepts
    • Scala Collections API
    • Basic I/O operations
    • Development life cycle
  • Week 2 – Introduction to Big Data and Sqoop
    • Introduction to Big Data
    • Getting Started with Sqoop
    • Sqoop Import
    • Sqoop Export
  • Week 3 – Spark Overview, RDD Operations
    • Architecture of Spark
    • Reading data into RDDs and writing back to HDFS
    • Row level transformations
    • Aggregations
    • Joining RDDs
    • Sorting and Ranking RDDs
  • Week 4 – Spark SQL and Hive
    • Introduction to query engines – Hive, Tez, Impala and
      Spark SQL
    • Create tables in Hive
    • Load data to Hive tables
    • Basic Hive Queries – Projection, Filtering, Aggregations, Joins etc
    • Advanced Hive Queries – Analytic and Windowing Functions
  • Week 5 – Data Frame Operations
    • Data Frames and Data Frame operations
    • Running Hive queries from Spark
    • Different file formats
    • Compressing data using different file formats
  • Week 6 – Structured Streaming pipelines
    • Basics of HBase
    • Basics of Kafka
    • Integrating Kafka with Spark Streaming
    • Building data pipeline with Kafka, Spark Streaming and
      HBase