Intermediate Google Cloud for Data Engineers

The Intermediate Google Cloud for Data Engineers training course is designed to advance the skills of those students who are already familiar with data engineering capabilities of Google Cloud to build specialized types of data pipelines, including those for machine learning, streaming data analytics, and recommendation systems.

The course starts by exploring data engineering with unbounded data sets and how streaming data analytics pipelines built with Apache Beam and DataFlow compare to alternatives, including lambda architecture. After working on a data pipeline using BigTable, DataFlow, and BigQuery, students will learn about what it takes to create data pipelines for machine learning and recommendation systems. The course will cover the importance of reproducibility when creating training, evaluation, and test data and then will use TensorFlow together with Apache Beam for feature engineering of both structured and unstructured data.

This course is designed for students who have either taken the Introduction to Google Cloud for Data Engineers course or have equivalent knowledge/experience. The course will be conducted on Google Cloud Platform.

Course Summary

Purpose: 
Learn how to use data engineering on Google Cloud to build specialized data pipelines for large scale streaming data analytics, machine learning, and recommendation systems.
Audience: 
Developers and developer teams looking to dive deeper into Google Cloud Platform's data science tools.
Skill Level: 
Learning Style: 

Hands-on training is customized, instructor-led training with an in-depth presentation of a technology and its concepts, featuring such topics as Java, OOAD, and Open Source.

Hands On help

Seminars are highly-focused, lecture-heavy, half-day to multi-day learning events. Seminars are a great way to create an awareness level of knowledge for a large number of concepts, in a short period of time. Think wide (breadth) and thin (depth).

Seminar help

Workshops are instructor-led lab-intensives focused on the practical application of technologies through the facilitation of a project-related lab. Workshops are just the opposite of Seminars. They deliver the highest level of knowledge transfer of any format. Think wide (breadth) and deep (depth).

Workshop help
Duration: 
3 Days
Productivity Objectives: 
  • Create data processing pipelines for streaming data analysis and machine learning
  • Create high performance, internet-scale, low-latency data stores with BigTable
  • Develop data pipelines to support machine learning model training and serving
  • Use TensorFlow, DataFlow, and BigQuery for unstructured and structured data pipelines
  • Design and propose scenarios for large scale data migrations to Google Cloud

What You'll Learn

In the Intermediate Google Cloud for Data Engineers training course you’ll learn:

  • Data Engineering for Unbounded Datasets
    • Bounded vs. Unbounded Datasets
    • Data Velocity vs. Volume + Variety
    • Challenges and Solutions for Streaming Data Pipelines
    • Lambda Architecture
    • Apache Beam and DataFlow
  • Advanced DataFlow for Streaming Data
    • Integration with Cloud Pub/Sub
    • Data De-duplication
    • Late-arriving and Out-of-order Data
    • Session and Sliding Windows
    • Watermarks and Triggers
    • Pipeline Side Inputs
    • DataFlow Templates
  • Advanced BigQuery for Streaming Data
    • Streaming Data Warehousing
    • SQL Analysis of Streaming and Batch Data
    • De-duplication and Data Consistency
    • Cost Estimation and Planning
  • BigTable
    • Use Cases for Low-Latency, Internet-Scale Storage
    • Wide-Column NoSQL Storage
    • Integration with Colossus Storage
    • Queries with HBase API
    • Key / Schema Design for BigTable
    • BigTable Performance Optimizations
  • Data Engineering for Machine Learning (ML)
    • Machine Learning with Google Cloud
    • Data Engineering for the ML Lifecycle
    • Training, Validation, Test Datasets for ML
    • Data Hashing for ML Reproducibility
    • Data Engineering for Benchmarks with BigQuery ML
    • Machine Learning Model Training vs. Serving
  • Feature Engineering from Structured Data for ML
    • Motivation for Feature Engineering
    • Feature Pre-Processing vs. Feature Creation
    • SQL and Apache Beam for Feature Engineering
    • TensorFlow Transform API
  • Feature Engineering for Unstructured Image Data for ML
    • Image Transforms for Data Augmentation
    • Google Colaboratory (Colab)
    • TensorFlow Image API
    • Image Format Conversion
    • Image Resizing, Cropping, and Rotation
    • Apache Beam for Image Data Augmentation
  • Data Engineering for Recommendation Systems
    • Recommendation Engines with Transactional Data
    • Cloud SQL Databases for Recommendation Data
    • Recommendation Engines with Apache Spark MLLib
  • Data Migration to Google Cloud
    • Cloud Data Migration Challenges
    • Migration Scenarios and Destinations

Get Custom Training Quote

We'll work with you to design a custom Intermediate Google Cloud for Data Engineers training program that meets your specific needs. A 100% guaranteed plan that works for you, your team, and your budget.

Learn More

Chat with one of our Program Managers from our Boulder, Colorado office to discuss various training options.

DevelopIntelligence has been in the technical/software development learning and training industry for nearly 20 years. We’ve provided learning solutions to more than 48,000 engineers, across 220 organizations worldwide.

About Develop Intelligence
Di Clients
Need help finding the right learning solution?   Call us: 877-629-5631