Data Engineering using Databricks on AWS and Azure
Course Details
As part of this course, you will be learning Data Engineering using Databricks.
- Getting Started with Databricks
- Setup Local Development Environment to develop Data Engineering Applications using Databricks
- Using Databricks CLI to manage files, jobs, clusters, etc related to Data Engineering Applications
- Spark Application Development Cycle to build Data Engineering Applications
- Databricks Jobs and Clusters
- Deploy and Run Data Engineering Jobs on Databricks Job Clusters as Python Application
- Deploy and Run Data Engineering Jobs on Job Cluster using Notebooks
- Deep Dive into Delta Lake using Dataframes
- Deep Dive into Delta Lake using Spark SQL
- Building Data Engineering Pipelines using Spark Structured Streaming on Databricks Clusters
- Incremental File Processing using Spark Structured Streaming leveraging Databricks Auto Loader cloudFiles
- Overview of Auto Loader cloudFiles File Discovery Modes – Directory Listing and File Notifications
- Differences between Auto Loader cloudFiles File Discovery Modes – Directory Listing and File Notifications
- Differences between traditional Spark Structured Streaming and leveraging Databricks Auto Loader cloudFiles for incremental file processing.
- Overview of Databricks SQL for Data Analysis and reporting.
Here are the videos on YouTube covering all the Databricks related content for this course.
- Databricks Platform Features – Deep Dive into Delta Lake using PySpark Data Frames
- Databricks Platform Features – Deep Dive into Delta Lake using Spark SQL