Comprehensive Spark 2 Workshop

Current Status
Not Enrolled
Price
Closed
Get Started
This course is currently closed

As part of this course we will be covering

  • Development life cycle
    • Setup Java, JDK, Scala, Spark and sbt
    • Install IntelliJ with Scala plugin
    • Overview of IntelliJ
    • Develop first Scala application
    • Develop first spark application and run in local mode
  • Learn Scala for Spark
    • Variables and Data types in Scala
    • Basic programming constructs
    • Pre-defined functions
    • User defined functions and anonymous functions
    • Object Oriented Concepts
    • Collections and TuplesOverview
    • Map Reduce APIs
  • Spark Overview
    • Develop first Spark application using Core APIs
    • Run it on the cluster and understand execution life cycle
    • Understanding shuffling and difference between groupByKey, reduceByKey and aggregateByKey
    • Spark execution modes – local, standalone, mesos, yarn and kubernetes
  • Setting up Spark 2 Cluster using mesos
    • Setup zookeeper and mesos using mesosphere
    • Install Spark on the cluster
    • Integrate with file systems – local and s3
    • Run Spark job on mesos and understand execution life cycle
  • RDD, Data Frames and Data Sets
    • Reading and writing RDD using different file systems and file formats
    • Building Data Frames, Data Sets
    • Difference between RDD, Data Frames and Data Sets
    • Building Schema programmatically and building Data Frames
    • Writing data frames to different file formats
    • Applying compression algorithms while writing the data
  • Processing Data using Data Frame Operations
    • Understanding functions
    • Filtering, Aggregations, Joins and Sorting
    • Analytics functions – aggregations, ranking and windowing with in a group
    • Read, process and write data using Hive tables
  • Processing Data using Spark SQL
    • Understanding functions
    • Filtering, Aggregations, Joins and Sorting
    • Analytics functions – aggregations, ranking and windowing with in a group
    • Read, process and write data using Hive tables
  • Building streaming pipelines using Kafka, Spark SQL and HBase
    • Understanding Kafka
    • Overview of HBase
    • Overview of Spark Streaming using DStreams

Pre-requisites

  • 64 bit laptop with at least 8 GB memory
  • Operating System: Windows 10 with Ubuntu or Linux or Mac OS
  • Local admin permissions on the laptop is highly desired