Big Data on Cloud (Hadoop and Spark on AWS)

Current Status
Not Enrolled
Get Started
This course is currently closed

Are you familiar with Big Data Technologies such as Hadoop and Spark and planning to understand how to build Big Data pipelines leveraging pay as you go model of cloud such as AWS?

This course is answer for that.


  • Basic programming using Python or Scala or both
  • Good knowledge about distributed file systems such as HDFS
  • Experience or Knowledge with distributed resource management frameworks such as YARN or Mesos
  • Good knowledge about distributed computing frameworks such as Map Reduce and Spark
  • Basic knowledge about Data Warehousing, ETL, Data Integration frameworks


Here is the curriculum for the course. If you are already familiar with Big Data technologies you can quickly add this important skill by going through this course.

  • Overview of AWS barebones (EC2, S3, EBS, Networking, Security, CLI etc)
  • Overview of AWS analytical services and comparison between on-premise cluster vs. cloud services. This session includes creating EMR cluster using quick options.
  • Step Execution and other advanced options of EMR
  • Quick revision of programming language – Scala 2.11
  • Quick revision of programming language – Python 3.x (including Dataframes)
  • Development life cycle of Spark 2 applications using Scala (using IntelliJ)
  • Development life cycle of Spark 2 applications using Python (using Pycharm)
  • Running Scala and Python applications on EMR Cluster

We might have another course in near future where we will be covering DynamoDB, Kinesis etc to deep dive into other services under analytics services category of AWS.

There is no lab associated with the course. You might have to pay money to AWS to get your hands dirty as demonstrated in the course.

Share this post