Current Status
Not Enrolled
Price
Closed
Get Started
This course is currently closed
As part of this course we will be covering
- Development life cycle
- Setup Java, JDK, Scala, Spark and sbt
- Install IntelliJ with Scala plugin
- Overview of IntelliJ
- Develop first Scala application
- Develop first spark application and run in local mode
- Learn Scala for Spark
- Variables and Data types in Scala
- Basic programming constructs
- Pre-defined functions
- User defined functions and anonymous functions
- Object Oriented Concepts
- Collections and TuplesOverview
- Map Reduce APIs
- Spark Overview
- Develop first Spark application using Core APIs
- Run it on the cluster and understand execution life cycle
- Understanding shuffling and difference between groupByKey, reduceByKey and aggregateByKey
- Spark execution modes – local, standalone, mesos, yarn and kubernetes
- Setting up Spark 2 Cluster using mesos
- Setup zookeeper and mesos using mesosphere
- Install Spark on the cluster
- Integrate with file systems – local and s3
- Run Spark job on mesos and understand execution life cycle
- RDD, Data Frames and Data Sets
- Reading and writing RDD using different file systems and file formats
- Building Data Frames, Data Sets
- Difference between RDD, Data Frames and Data Sets
- Building Schema programmatically and building Data Frames
- Writing data frames to different file formats
- Applying compression algorithms while writing the data
- Processing Data using Data Frame Operations
- Understanding functions
- Filtering, Aggregations, Joins and Sorting
- Analytics functions – aggregations, ranking and windowing with in a group
- Read, process and write data using Hive tables
- Processing Data using Spark SQL
- Understanding functions
- Filtering, Aggregations, Joins and Sorting
- Analytics functions – aggregations, ranking and windowing with in a group
- Read, process and write data using Hive tables
- Building streaming pipelines using Kafka, Spark SQL and HBase
- Understanding Kafka
- Overview of HBase
- Overview of Spark Streaming using DStreams
Pre-requisites
- 64 bit laptop with at least 8 GB memory
- Operating System: Windows 10 with Ubuntu or Linux or Mac OS
- Local admin permissions on the laptop is highly desired