Skip to content

Contact sales

By filling out this form and clicking submit, you acknowledge our privacy policy.

Intermediate Google Cloud for Data Engineers

Course Summary

The Intermediate Google Cloud for Data Engineers training course is designed to advance the skills of those students who are already familiar with data engineering capabilities of Google Cloud to build specialized types of data pipelines, including those for machine learning, streaming data analytics, and recommendation systems.

The course starts by exploring data engineering with unbounded data sets and how streaming data analytics pipelines built with Apache Beam and DataFlow compare to alternatives, including lambda architecture. After working on a data pipeline using BigTable, DataFlow, and BigQuery, students will learn about what it takes to create data pipelines for machine learning and recommendation systems. The course concludes with covering the importance of reproducibility when creating training, evaluation, and test data and then will use TensorFlow together with Apache Beam for feature engineering of both structured and unstructured data.

Before attending this course, students should take the Google Cloud for Data Engineers course or be familiar with all of the topics listed here: Google Cloud for Data Engineers

Purpose
Learn how to use data engineering on Google Cloud to build specialized data pipelines for large scale streaming data analytics, machine learning, and recommendation systems.
Audience
Data Engineers who want to build specialized types of data pipelines, including those for machine learning, streaming data analytics, and recommendation systems.
Role
Data Engineer - Software Developer - Technical Manager
Skill Level
Intermediate
Style
Fast Track - Targeted Topic - Workshops
Duration
3 Days
Related Technologies
Apache Spark | Google Cloud | Tensorflow | Apache

 

Productivity Objectives
  • Construct data processing pipelines for streaming data analysis and machine learning.
  • Create high performance, internet-scale, low-latency data stores with BigTable.
  • Develop data pipelines to support machine learning model training and serving.
  • Employ TensorFlow, DataFlow, and BigQuery for unstructured and structured data pipelines.
  • Design and propose scenarios for large scale data migrations to Google Cloud.

What You'll Learn:

In the Intermediate Google Cloud for Data Engineers training course, you'll learn:
  • Data Engineering for Unbounded Datasets
    • Bounded vs. Unbounded Datasets
    • Data Velocity vs. Volume + Variety
    • Challenges and Solutions for Streaming Data Pipelines
    • Lambda Architecture
    • Apache Beam and DataFlow
  • Advanced DataFlow for Streaming Data
    • Integration with Cloud Pub/Sub
    • Data De-duplication
    • Late-arriving and Out-of-order Data
    • Session and Sliding Windows
    • Watermarks and Triggers
    • Pipeline Side Inputs
    • DataFlow Templates
  • Advanced BigQuery for Streaming Data
    • Streaming Data Warehousing
    • SQL Analysis of Streaming and Batch Data
    • De-duplication and Data Consistency
    • Cost Estimation and Planning
  • BigTable
    • Use Cases for Low-Latency, Internet-Scale Storage
    • Wide-Column NoSQL Storage
    • Integration with Colossus Storage
    • Queries with HBase API
    • Key / Schema Design for BigTable
    • BigTable Performance Optimizations
  • Data Engineering for Machine Learning (ML)
    • Machine Learning with Google Cloud
    • Data Engineering for the ML Lifecycle
    • Introduction to ML Use Cases
    • Training, Validation, Test Datasets for ML
    • Data Hashing for ML Reproducibility
    • Data Engineering for Benchmarks with BigQuery ML
    • Machine Learning Model Training vs. Serving
  • Feature Engineering from Structured Data for ML
    • Motivation for Feature Engineering
    • Feature Pre-Processing vs. Feature Creation
    • SQL and Apache Beam for Feature Engineering
    • TensorFlow Transform API
  • Feature Engineering for Unstructured Image Data for ML
    • Image Transforms for Data Augmentation
    • Google Colaboratory (Colab)
    • TensorFlow Image API
    • Image Format Conversion
    • Image Resizing, Cropping, and Rotation
    • Apache Beam for Image Data Augmentation
  • Data Engineering for Recommendation Systems
    • Recommendation Engines with Transactional Data
    • Cloud SQL Databases for Recommendation Data
    • Recommendation Engines with Apache Spark MLLib
    • Hosting Recommendation Systems with Dataproc
  • Data Migration to Google Cloud
    • Cloud Data Migration Challenges
    • Migration Scenarios and Destinations
“I appreciated the instructor's technique of writing live code examples rather than using fixed slide decks to present the material.”

VMware

Dive in and learn more

When transforming your workforce, it's important to have expert advice and tailored solutions. We can help. Tell us your unique needs and we'll explore ways to address them.

Let's chat

By filling out this form and clicking submit, you acknowledge our privacy policy.