As part of this course we will explore Kafka in detail while understanding the one of the most common use case of Kafka and Spark – Building Streaming Data Pipelines. Following are the technologies we will be using as part of this workshop.
- IDE – IntelliJ
- Programming Language – Scala
- Get messages from web server log files – Kafka Connect
- Channelize data – Kafka (it will be covered extensively)
- Consume, process and save – Spark Streaming using Scala as programming language
- Data store for processed data – HBase
- We will be using our Big Data cluster for the demo where all these technologies are pre-installed.
Here is the flow of the course
- Setup Development Environment to build streaming applications
- Setup every thing on single node (Logstash, HDFS, Spark, Kafka etc)
- Overview of Kafka
- Multibroker/Multi-server setup of Kafka
- Overview about Streaming technologies and Spark Streaming
- Overview of NoSQL Databases and HBase
- Development life cycle of HBase application
- Case Study: Kafka at LinkedIn
- Final Demo: Streaming Data Pipelines
Reviews
There are no reviews yet.