Durga Gadiraju

Comprehensive Spark 2 Workshop

Current Status

Not Enrolled

Price

Closed

Get Started

This course is currently closed

As part of this course we will be covering

Development life cycle
- Setup Java, JDK, Scala, Spark and sbt
- Install IntelliJ with Scala plugin
- Overview of IntelliJ
- Develop first Scala application
- Develop first spark application and run in local mode
Learn Scala for Spark
- Variables and Data types in Scala
- Basic programming constructs
- Pre-defined functions
- User defined functions and anonymous functions
- Object Oriented Concepts
- Collections and TuplesOverview
- Map Reduce APIs
Spark Overview
- Develop first Spark application using Core APIs
- Run it on the cluster and understand execution life cycle
- Understanding shuffling and difference between groupByKey, reduceByKey and aggregateByKey
- Spark execution modes – local, standalone, mesos, yarn and kubernetes
Setting up Spark 2 Cluster using mesos
- Setup zookeeper and mesos using mesosphere
- Install Spark on the cluster
- Integrate with file systems – local and s3
- Run Spark job on mesos and understand execution life cycle
RDD, Data Frames and Data Sets
- Reading and writing RDD using different file systems and file formats
- Building Data Frames, Data Sets
- Difference between RDD, Data Frames and Data Sets
- Building Schema programmatically and building Data Frames
- Writing data frames to different file formats
- Applying compression algorithms while writing the data
Processing Data using Data Frame Operations
- Understanding functions
- Filtering, Aggregations, Joins and Sorting
- Analytics functions – aggregations, ranking and windowing with in a group
- Read, process and write data using Hive tables
Processing Data using Spark SQL
- Understanding functions
- Filtering, Aggregations, Joins and Sorting
- Analytics functions – aggregations, ranking and windowing with in a group
- Read, process and write data using Hive tables
Building streaming pipelines using Kafka, Spark SQL and HBase
- Understanding Kafka
- Overview of HBase
- Overview of Spark Streaming using DStreams

Pre-requisites

64 bit laptop with at least 8 GB memory
Operating System: Windows 10 with Ubuntu or Linux or Mac OS
Local admin permissions on the laptop is highly desired

Durga Gadiraju

Comprehensive Spark 2 Workshop

Pre-requisites

Share this post

Join Our Community

Follow Us

Links

Contact Info

Address

Phone

Email