Skip to content

Contact sales

By filling out this form and clicking submit, you acknowledge our privacy policy.

Data Engineering Practices

Course Summary

The Data Engineering Practices training course covers the fundamental best practices in a professional data engineering team.

The course begins by focusing on the software development lifecycle and how it applies to Data Engineering. Next, students will learn about process tools including unit tests and integration testing. The course concludes by covering Continuous Integration/Continuous Delivery (CI/CD) and the best DevOps practices.

Purpose
Learn the foundational concepts of distributed computing, distributed data processing, data management and data pipelines.
Audience
Developers, Systems Administrators, Data Scientists looking to learn the fundamental building blocks of big data engineering.
Role
Business Analyst - Data Engineer - Data Scientist - Software Developer - System Administrator
Skill Level
Intermediate
Style
Workshops
Duration
3 Days
Related Technologies
CI/CD | Python

 

Productivity Objectives
  • Interpret the best practices in creating and maintaining data engineering pipelines in a professional setting.
  • Explain the background of distributed systems, relational databases and key-value stores.
  • Grasp the fundamentals of data stacks, their uses, advantages and limitations.
  • Recognize the tools for data management, data access, governance and integration, operations and security.

What You'll Learn:

In the Data Engineering Practices training course, you'll learn:
  • Software Development Lifecycle (and how it applies to Data Engineering)
    • Plan, Design, Implement
      • Details on architecture drawings
      • Quick design pattern review around how to ensure safe design
      • How to deploy
    • Test, Deploy, Maintain
      • Testing in Data Engineering environments
      • Integration tests
      • Unit tests
      • Maintenance of existing architecture
    • Environments
      • Creating maintainable, repeatable environments
      • Docker
      • Vagrant
      • Packer
      • Provisioners to create environments
      • Puppet
      • Chef
      • Ansible
  • Infrastructure
    • Networking and Security
      • Creating VPCs and internal, private networks
      • Network mounting
      • Creating communication between disparate parts of your environment
      • Keeping your network secure
    • Maintenance around infrastructure:
      • Infrastructure-as-code (Terraform)
      • Ensuring data integrity
      • Security around the storage of data
      • Compression to maximize data storage capability
      • Block store and mounting
  • Continuous Integration
    • Creating a code repository
      • Hooks
      • Linting
      • Automatic builds on push to master
    • Integration Testing
      • Checking code compatibility in various environments
      • Creating mock data entries to check transformations
      • Checking test coverage
    • Unit Testing
      • Mocking data inputs
      • Creating effective code coverage
      • Running unit tests automatically
      • Automating unit test runs
      • Setting failed parameters
  • Summary and Creating Pipelines
    • Continuous Deployments
      • Setting up a CI/CD pipeline
      • Jenkins
      • Travis
      • AWS Codepipeline
      • Repository management
      • Squash merges
      • Automatic deployments from master
      • Testing on multiple branches
      • Setting up linting in GUIs
      • Naming standards
“I appreciated the instructor's technique of writing live code examples rather than using fixed slide decks to present the material.”

VMware

Dive in and learn more

When transforming your workforce, it's important to have expert advice and tailored solutions. We can help. Tell us your unique needs and we'll explore ways to address them.

Let's chat

By filling out this form and clicking submit, you acknowledge our privacy policy.