Data Engineering Practices

Course Summary

The Data Engineering Practices training course covers the fundamental best practices in a professional data engineering team.

The course begins by focusing on the software development lifecycle and how it applies to Data Engineering. Next, students will learn about process tools including unit tests and integration testing. The course concludes by covering Continuous Integration/Continuous Delivery (CI/CD) and the best DevOps practices.

Purpose	Learn the foundational concepts of distributed computing, distributed data processing, data management and data pipelines.
Audience	Developers, Systems Administrators, Data Scientists looking to learn the fundamental building blocks of big data engineering.
Role	Business Analyst - Data Engineer - Data Scientist - Software Developer - System Administrator
Skill Level	Intermediate
Style	Workshops
Duration	3 Days
Related Technologies	CI/CD \| Python

Productivity Objectives

Interpret the best practices in creating and maintaining data engineering pipelines in a professional setting.
Explain the background of distributed systems, relational databases and key-value stores.
Grasp the fundamentals of data stacks, their uses, advantages and limitations.
Recognize the tools for data management, data access, governance and integration, operations and security.

What You'll Learn:

In the Data Engineering Practices training course, you'll learn:

Software Development Lifecycle (and how it applies to Data Engineering)
- Plan, Design, Implement
  - Details on architecture drawings
  - Quick design pattern review around how to ensure safe design
  - How to deploy
- Test, Deploy, Maintain
  - Testing in Data Engineering environments
  - Integration tests
  - Unit tests
  - Maintenance of existing architecture
- Environments
  - Creating maintainable, repeatable environments
  - Docker
  - Vagrant
  - Packer
  - Provisioners to create environments
  - Puppet
  - Chef
  - Ansible
Infrastructure
- Networking and Security
  - Creating VPCs and internal, private networks
  - Network mounting
  - Creating communication between disparate parts of your environment
  - Keeping your network secure
- Maintenance around infrastructure:
  - Infrastructure-as-code (Terraform)
  - Ensuring data integrity
  - Security around the storage of data
  - Compression to maximize data storage capability
  - Block store and mounting
Continuous Integration
- Creating a code repository
  - Hooks
  - Linting
  - Automatic builds on push to master
- Integration Testing
  - Checking code compatibility in various environments
  - Creating mock data entries to check transformations
  - Checking test coverage
- Unit Testing
  - Mocking data inputs
  - Creating effective code coverage
  - Running unit tests automatically
  - Automating unit test runs
  - Setting failed parameters
Summary and Creating Pipelines
- Continuous Deployments
  - Setting up a CI/CD pipeline
  - Jenkins
  - Travis
  - AWS Codepipeline
  - Repository management
  - Squash merges
  - Automatic deployments from master
  - Testing on multiple branches
  - Setting up linting in GUIs
  - Naming standards

Real-World Content

Project-focused demos and labs using your tool stack and environment, not some canned "training room" lab.

Expert Practitioners

Industry experts that bring their battle scars into the classroom.

Experiential Learning

More coding than lecture, coupled with architectural and design discussions.

Tailored Outlines

One-size-fits-all doesn't apply to training teams. That's where we come in!

“I appreciated the instructor's technique of writing live code examples rather than using fixed slide decks to present the material.”

VMware

Dive in and learn more

When transforming your workforce, it's important to have expert advice and tailored solutions. We can help. Tell us your unique needs and we'll explore ways to address them.

Let's chat

First Name*

Last Name*

Business Email*

Company*

Job Title*

Phone*

Country*

Tell us about what you’re looking to accomplish:

By filling out this form and clicking submit, you acknowledge our privacy policy.

Data Engineering Practices

Course Summary

Purpose

Audience

Role

Skill Level

Style

Duration

Related Technologies