The Creating & Monitoring Big Data Pipelines with Apache Airflow training course is designed to demonstrate how to create, schedule and monitor data pipelines using Apache Airflow by programmatically authoring, scheduling and creating workflows.
The course begins with the core functionalities of Apache Airflow and then moves on to building data pipelines. Next, it explores advanced topics, such as start_date and schedule_time, dealing with time zones, and much more. The course concludes by analyzing how to handle monitoring and security with Apache Airflow, as well as managing and deploying workflows in the cloud.
Prerequisites: A basic knowledge of Python and basic understanding of big data tools (Spark, Hive) are expected
Purpose
|
Promote an in-depth understanding of how to use Apache Airflow to create, schedule and monitor data pipelines. |
Audience
|
Data Engineers familiar with Python and big data tools such as Hive and Spark. |
Role
| Data Scientist - DevOps Engineer |
Skill Level
| Intermediate |
Style
| Workshops |
Duration
| 3 Days |
Related Technologies
| Big Data Training | Apache Airflow | Apache Spark | Python |
Productivity Objectives
- Utilize code production-grade data pipelines with Airflow
- Schedule & monitor data pipelines using Apache Airflow
- Understand and apply core/advanced concepts of Apache Airflow.
- Create data pipelines using AWS MWAA (Managed Workflow for Apache Airflow)