Lesson Category: Data Engineering

Getting Started – Apache Kafka

Let us understand the concepts behind Kafka to get data from sources to targets in real time. Setup Datasets Setup Kafka locally Kafka on multi-node cluster Apache Kafka – Overview Apache Kafka – Glossary or Concepts Zookeeper Commands – Overview Producer and Consumer – Different Scenarios Messages and Message Format Setup gen_logs One of the …

Getting Started – Apache Kafka Read More »

Data Ingestion – Apache Sqoop

Apache Sqoop is used to copy data between HDFS and RDBMS databases. As part of this lesson, we will see how to perform Sqoop operations. Getting information from databases – list-databases, list-tables and eval Sqoop import and import-all-tables Sqoop export Supported file formats Handling field delimiters Incremental import and more Introduction and Objectives Accessing Documentation …

Data Ingestion – Apache Sqoop Read More »

Getting Started with CCA using Scala

As part of this lesson we will explore about the skills required for the CCA 175 Certification. This is the agenda we will be covering as part of this lesson. Introduction Curriculum Required Skills Setup Environment HDFS and YARN Data Sets Windows Environment(labs) Using labs for Preparation :

Scala Fundamentals for Spark

As part of this lesson we will explore Scala skills relevant for learning Spark especially for the certifications. What is Scala? Scala is JVM based functional programming language. Why Scala? Even though Scala is there for more than a decade (founded in 2000), it have gained lot of momentum with Spark. Spark is completely developed …

Scala Fundamentals for Spark Read More »

Basics of programming using Python

Why python? Python is designed to be highly readable. It uses English keywords frequently whereas the other languages use punctuations.  Python is Interpreted, Interactive and object-oriented. History of python Guido van Rossum made initial attempts in development of Python in the late eighties and early nineties.The first version of python(Python 1.0) was released in November …

Basics of programming using Python Read More »