Creating & Monitoring Big Data Pipelines with Apache Airflow

Course Summary

The Creating & Monitoring Big Data Pipelines with Apache Airflow training course is designed to demonstrate how to create, schedule and monitor data pipelines using Apache Airflow by programmatically authoring, scheduling and creating workflows.

The course begins with the core functionalities of Apache Airflow and then moves on to building data pipelines. Next, it explores advanced topics, such as start_date and schedule_time, dealing with time zones, and much more. The course concludes by analyzing how to handle monitoring and security with Apache Airflow, as well as managing and deploying workflows in the cloud.

Prerequisites: A basic knowledge of Python and basic understanding of big data tools (Spark, Hive) are expected

Purpose	Promote an in-depth understanding of how to use Apache Airflow to create, schedule and monitor data pipelines.
Audience	Data Engineers familiar with Python and big data tools such as Hive and Spark.
Role	Data Scientist - DevOps Engineer
Skill Level	Intermediate
Style	Workshops
Duration	3 Days
Related Technologies	Big Data Training \| Apache Airflow \| Apache Spark \| Python

Productivity Objectives

Utilize code production-grade data pipelines with Airflow
Schedule & monitor data pipelines using Apache Airflow
Understand and apply core/advanced concepts of Apache Airflow.
Create data pipelines using AWS MWAA (Managed Workflow for Apache Airflow)

What You'll Learn:

In the Creating & Monitoring Big Data Pipelines with Apache Airflow training course, you'll learn:

Understand Core functionalities of Apache Airflow
- What is Apache Airflow
- How does Apache Airflow work?
- Installation & Setup
- Understand Airflow Architecture
- Understand core concepts - DAGS/ Tasks/ Operators
- Understand interface - Airflow UI Tour
- Use CLI
Build Data Pipeline
- Sqoop operator - Ingest Data from RDBMS
- Http Sensor - checking API availability
- File Sensor - Checking File
- Python Operator - Download Data
- Bash Operator - Move data to HDFS
- Hive Operator - Create Hive tables
- Spark Submit Operator - Run Spark Job
- Email Operator - Send email notifications
- Data pipeline in action
Mastering Apache Airflow
- Understand start_date & schedule_time
- Backfill and Catchup
- Deal with time zones
- Sharing data - XComs in actions
- Retry/Alerts on task failures
- Pools & priority weights
- Understand Different Executors - Local/Celery/Sequential/Kubernetes
- Create customs plugins
Monitor Apache Airflow
- Understand logging system
- Set up custom logging
- Store logs in S3
Security in Apache Airflow
- Encrypt sensitive data with Fernet Keys
- Rotate Fernet Keys
- Hide Variables
- Enable Password authentication
Airflow in cloud
- Utilize Amazon Managed Workflows for Apache Airflow
- Deploy Airflow on Kubernetes cluster on AWS (EKS)

Real-World Content

Project-focused demos and labs using your tool stack and environment, not some canned "training room" lab.

Expert Practitioners

Industry experts that bring their battle scars into the classroom.

Experiential Learning

More coding than lecture, coupled with architectural and design discussions.

Tailored Outlines

One-size-fits-all doesn't apply to training teams. That's where we come in!

“I appreciated the instructor's technique of writing live code examples rather than using fixed slide decks to present the material.”

VMware

Dive in and learn more

When transforming your workforce, it's important to have expert advice and tailored solutions. We can help. Tell us your unique needs and we'll explore ways to address them.

Let's chat

First Name*

Last Name*

Business Email*

Company*

Job Title*

Phone*

Country*

Tell us about what you’re looking to accomplish:

By filling out this form and clicking submit, you acknowledge our privacy policy.

Creating & Monitoring Big Data Pipelines with Apache Airflow

Course Summary

Purpose

Audience

Role

Skill Level

Style

Duration

Related Technologies