Archives: Courses

Hadoop Certifications – HDP Certified Developer – no Java

Introduction Apache Hadoop is an open-source software framework for distributed storage and distributed processing of very large datasets on computer clusters built from commodity hardware. All the modules in Hadoop are designed with a fundamental assumption that hardware failures are common and should be automatically handled by the framework. Typically Hadoop cluster can have few …

Hadoop Certifications – HDP Certified Developer – no Java Read More »

Comprehensive Spark 2 Workshop

As part of this course we will be covering Development life cycle Setup Java, JDK, Scala, Spark and sbt Install IntelliJ with Scala plugin Overview of IntelliJ Develop first Scala application Develop first spark application and run in local mode Learn Scala for Spark Variables and Data types in Scala Basic programming constructs Pre-defined functions …

Comprehensive Spark 2 Workshop Read More »

Building ETL Pipelines – 201806

As part of this course, we will be seeing Overview of development applications using Scala Overview of development applications using Python Spark Overview Data Processing using Data Frame Operations Data Processing using Spark SQL Data Modeling Techniques Performance Tuning in Spark Building ETL Pipelines using AWS EMR Please find the GitHub repository here

Data Engineering Bootcamp

Data Engineering, by definition, is the practice of processing data for an enterprise. Through the course of this bootcamp, a user will learn this essential skill and will be equipped to process both streaming data and data in offline batches. About Instructor Job Roles Data Engineering Big Data ecosystem Data Engineering vs. Data Science Curriculum …

Data Engineering Bootcamp Read More »