Skip to content

Contact sales

By filling out this form and clicking submit, you acknowledge our privacy policy.

Creating & Monitoring Big Data Pipelines with Apache Airflow

Course Summary

The Creating & Monitoring Big Data Pipelines with Apache Airflow training course is designed to demonstrate how to create, schedule and monitor data pipelines using Apache Airflow by programmatically authoring, scheduling and creating workflows.

The course begins with the core functionalities of Apache Airflow and then moves on to building data pipelines. Next, it explores advanced topics, such as start_date and schedule_time, dealing with time zones, and much more. The course concludes by analyzing how to handle monitoring and security with Apache Airflow, as well as managing and deploying workflows in the cloud.

Prerequisites: A basic knowledge of Python and basic understanding of big data tools (Spark, Hive) are expected

Purpose
Promote an in-depth understanding of how to use Apache Airflow to create, schedule and monitor data pipelines.
Audience
Data Engineers familiar with Python and big data tools such as Hive and Spark.
Role
Data Scientist - DevOps Engineer
Skill Level
Intermediate
Style
Workshops
Duration
3 Days
Related Technologies
Big Data Training | Apache Airflow | Apache Spark | Python

 

Productivity Objectives
  • Utilize code production-grade data pipelines with Airflow
  • Schedule & monitor data pipelines using Apache Airflow
  • Understand and apply core/advanced concepts of Apache Airflow.
  • Create data pipelines using AWS MWAA (Managed Workflow for Apache Airflow)

What You'll Learn:

In the Creating & Monitoring Big Data Pipelines with Apache Airflow training course, you'll learn:
  • Understand Core functionalities of Apache Airflow
    • What is Apache Airflow
    • How does Apache Airflow work?
    • Installation & Setup
    • Understand Airflow Architecture
    • Understand core concepts - DAGS/ Tasks/ Operators
    • Understand interface - Airflow UI Tour
    • Use CLI
  • Build Data Pipeline
    • Sqoop operator - Ingest Data from RDBMS
    • Http Sensor - checking API availability
    • File Sensor - Checking File
    • Python Operator - Download Data
    • Bash Operator - Move data to HDFS
    • Hive Operator - Create Hive tables
    • Spark Submit Operator - Run Spark Job
    • Email Operator - Send email notifications
    • Data pipeline in action
  • Mastering Apache Airflow
    • Understand start_date & schedule_time
    • Backfill and Catchup
    • Deal with time zones
    • Sharing data - XComs in actions
    • Retry/Alerts on task failures
    • Pools & priority weights
    • Understand Different Executors - Local/Celery/Sequential/Kubernetes
    • Create customs plugins
  • Monitor Apache Airflow
    • Understand logging system
    • Set up custom logging
    • Store logs in S3
  • Security in Apache Airflow
    • Encrypt sensitive data with Fernet Keys
    • Rotate Fernet Keys
    • Hide Variables
    • Enable Password authentication
  • Airflow in cloud
    • Utilize Amazon Managed Workflows for Apache Airflow
    • Deploy Airflow on Kubernetes cluster on AWS (EKS)
“I appreciated the instructor's technique of writing live code examples rather than using fixed slide decks to present the material.”

VMware

Dive in and learn more

When transforming your workforce, it's important to have expert advice and tailored solutions. We can help. Tell us your unique needs and we'll explore ways to address them.

Let's chat

By filling out this form and clicking submit, you acknowledge our privacy policy.