Building Streaming Data Pipelines – Using Kafka and Spark

$54.95 $34.95

Let us learn how to build streaming data pipelines using technologies like logstash, Kafka, Spark Structured Streaming, Spark legacy streaming, HBase and more. We will also cover how to set up Kafka multi broker cluster as part of this course.

As part of this course we will explore Kafka in detail while understanding the one of the most common use case of Kafka and Spark – Building Streaming Data Pipelines. Following are the technologies we will be using as part of this workshop.

  • IDE – IntelliJ
  • Programming Language – Scala
  • Get messages from web server log files – Kafka Connect
  • Channelize data – Kafka (it will be covered extensively)
  • Consume, process and save – Spark Streaming using Scala as programming language
  • Data store for processed data – HBase
  • We will be using our Big Data cluster for the demo where all these technologies are pre-installed.

Here is the flow of the course

  • Setup Development Environment to build streaming applications
  • Setup every thing on single node (Logstash, HDFS, Spark, Kafka etc)
  • Overview of Kafka
  • Multibroker/Multi-server setup of Kafka
  • Overview about Streaming technologies and Spark Streaming
  • Overview of NoSQL Databases and HBase
  • Development life cycle of HBase application
  • Case Study: Kafka at LinkedIn
  • Final Demo: Streaming Data Pipelines



There are no reviews yet.

Be the first to review “Building Streaming Data Pipelines – Using Kafka and Spark”

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.