Google Cloud for Data Engineers

Course Summary

The Google Cloud for Data Engineers training course teaches students the fundamentals of Google Cloud Platform (GCP) for building and running data pipelines that process batch or streaming data.

This course starts by learning about the GCP services most frequently used by data engineers. Next, students will advance their existing skills in SQL, Hadoop, and Python by understanding how to reuse existing applications to take advantage of managed MySQL and Hadoop/Spark infrastructure on GCP. Most of the course focuses on differentiating capabilities of GCP for data engineering. Next, students will learn how to process, analyze, and store petabytes of batch and streaming data with serverless capabilities like PubSub, BigQuery, and Dataflow. For example, students will work with Apache Beam code that enables going beyond the limitations of the original MapReduce framework. The course concludes with students being introduced to the Machine Learning capabilities of GCP that data engineers can start using without having prior data science experience.

The course will also provide architectural overviews of data processing pipelines enabled by GCP and how you choose the right GCP services for your project.

Purpose	Learn to build systems on Google Cloud to store and process batch or streaming data.
Audience	Developers and developer teams looking to use streaming data on GCP.
Role	Data Engineer - Software Developer
Skill Level	Introduction
Style	Fast Track - Hack-a-thon - Learning Spikes - Workshops
Duration	3 Days
Related Technologies	BigQuery \| Hadoop \| Google Cloud \| Python \| MySQL \| Apache

Productivity Objectives

Describe the capabilities of Google Cloud for data engineering.
Build and run data processing pipelines on GCP to ingest, analyze, and store data.
Identify how to use managed Google Cloud infrastructure for MySQL and Hadoop/Spark.
Discuss when and how to use PubSub, DataFlow, and BigQuery for serverless data pipelines.
Integrate data pipelines with other GCP services.
Identify what criteria to use for the design of data processing pipelines on GCP.

What You'll Learn:

In the Google Cloud for Data Engineers training course, you'll learn:

Google Cloud Basics
- Why Google Cloud
- Managed Virtual Infrastructure vs. Serverless
- Google Cloud for Data Engineers
Compute Engine
- Virtualized Infrastructure
- Cloud Shell
- Persistent vs. Transient Storage
- Computer Engine User Interface
- Pre-emptible Instances
Cloud Storage (GCS)
- Object Storage and Buckets
Integration with GCP
Web-based and Command Line Interfaces
Cloud SQL
- Provisioning Managed Database Infrastructure
- Configuration of MySQL on GCP
- Batch Data Import/Export with Cloud SQL
- Web-based Interface
- Integration of MySQL with GCP Services and Applications
Cloud Pub/Sub
- Distributed Messaging Basics
- Publish/Subscribe Messaging Model
- Topics and Subscriptions for Messaging
- Command Line and Python Interfaces
Datastore
- Object-Relational Impedance Mismatch
- Datastore for Transactional Data
- Java APIs for Datastore
Machine Learning APIs
- Colaboratory
Vision, Natural Language, Translate APIs
AutoML Vision
Dataproc
- MapReduce Framework
- Provisioning Managed Apache Hadoop/Spark/YARN Infrastructure
- Customizing Apache Bigtop Distribution
- Pre-Emptible Instances for MapReduce
- Dataproc User and Command Line Interfaces
- Map vs. FlatMap for MapReduce
- Running Apache Hive, Apache Pig, and PySpark
Running and Monitoring MapReduce Jobs
- Storage Migration from HDFS to GCS
DataFlow
- Apache Beam Framework
- Batch and Streaming Data Processing Pipelines
- Run Apache Beam in Cloud Shell
- Apache Beam Combine vs. GroupBy
- Submitting Apache Beam Pipelines
- Running Batch and Streaming Dataflow Jobs
- Apache Beam Pipelines with Side-Inputs
- Autoscaling Streaming Apache Beam Jobs
- Apache Beam Windows and Triggers
- Web-based and Command Line Interface
- Monitoring Dataflow Jobs
BigQuery
- Serverless data warehousing
- Columnar vs. Row-based Storage
- Normalization vs. Denormalization with Columnar Storage
- Projects, Datasets, Tables
- Batch Data Import/Export
- Semi-Structured Data Analysis with SQL Arrays and Structs
- Partitions and Performance Optimizations
Data Engineering with GCP
- Architectures for sample batch and streaming pipelines
- GCP Storage Optimal Access Patterns
- GCP Storage Service Selection Decision Model
- Cost Estimation

Real-World Content

Project-focused demos and labs using your tool stack and environment, not some canned "training room" lab.

Expert Practitioners

Industry experts that bring their battle scars into the classroom.

Experiential Learning

More coding than lecture, coupled with architectural and design discussions.

Tailored Outlines

One-size-fits-all doesn't apply to training teams. That's where we come in!

“I appreciated the instructor's technique of writing live code examples rather than using fixed slide decks to present the material.”

VMware

Dive in and learn more

When transforming your workforce, it's important to have expert advice and tailored solutions. We can help. Tell us your unique needs and we'll explore ways to address them.

Let's chat

First Name*

Last Name*

Business Email*

Company*

Job Title*

Phone*

Country*

Tell us about what you’re looking to accomplish:

By filling out this form and clicking submit, you acknowledge our privacy policy.

Google Cloud for Data Engineers

Course Summary

Purpose

Audience

Role

Skill Level

Style

Duration

Related Technologies