Apache Spark 2 with Python 3 (pyspark)

Current Status
Not Enrolled
Get Started
This course is currently closed

As part of this course you will be learning building scaleable applications using Spark 2 with Python as programming language.


Here are the pre-requisites for the course.

  • 64 bit laptop with 64 bit operating system
  • At least 4 GB RAM (8 GB RAM is highly desired)
  • At least dual core CPU (quad core CPU is highly desired)
  • Windows 10 or latest Mac OS or latest Linux Operating system

You might be able to learn with other configurations but we will not provide any support. If you have lower configuration than above, we highly recommend you to subscribe to our labs where you can learn most of the skills using command line.


Let us see high level details about this course.

  • Python is one of the leading programming language
  • Spark is a distributed computing framework which works on any file system
  • Kafka is highly scalable and reliable streaming data ingestion tool
  • HBase is NoSQL database categorized under Big Data technology for real time use cases

As part of this course we will see how we can build end to end applications using these different technologies. Here is the detailed agenda for the course.

  • Fundamentals of programming using Python
    • Basic programming constructs using Python 3
    • All about Functions in Python 3
    • Overview of Collections and Types in Python 3
    • Manipulating collections using Map Reduce APIs in Python 3
    • Pandas – Series and Data Frames in Python 3
  • Apache Spark Overview – Architecture and Core APIs
    • Spark Architecture and Execution Modes
    • RDD, DAG and Lazy Evaluation
    • Basic Transformations and Actions
    • Advanced Transformations
    • Execution Life Cycle
    • Accumulators and Broadcast Variables
  • Data Frame Operations and Spark SQL
    • Creating Data Frames and Pre Defined Functions
    • Data Frame Operations – Basic Transformations such as filtering, aggregations, joins etc
    • Data Frame Operations – Analytics Functions or Windowing Functions
    • Spark SQL – Basic Transformations such as filtering, aggregations, joins etc
    • Spark SQL – Analytics Functions or Windowing Functions
    • Different file formats – text, json, orc, parquet, avrò etc
    • Reading text data with custom delimiters
    • Compression concepts and algorithms
  • Building Streaming Pipelines using Kafka and Spark Streaming
    • Overview of Kafka
    • Spark Streaming – Legacy
    • Structured Streaming
    • Integrating Kafka with Structured Streaming
    • Overview of HBase
    • Saving processed streaming data in HBase

Before getting started let us understand pre-requisites and setup the environment for the course. Demo will be given on windows machine. But the instructions are not very different for Mac or Linux

About ITVersity

ITVersity is online learning platform focused in emerging technologies such as Big Data, DevOps etc. We have different components as part of our platform.

  • Content (both free using YouTube channel and Premium)
  • Labs with Support – to accelerate learning on simulated yet realistic environments
  • Community – 24×7 self supported community to learn emerging technologies

On our YouTube channel we conduct live sessions regularly. Please do subscribe to get notifications for our live sessions by clicking here.

Course Content

Expand All

Share this post