The Intermediate Google Cloud for Data Engineers training course is designed to advance the skills of those students who are already familiar with data engineering capabilities of Google Cloud to build specialized types of data pipelines, including those for machine learning, streaming data analytics, and recommendation systems.
The course starts by exploring data engineering with unbounded data sets and how streaming data analytics pipelines built with Apache Beam and DataFlow compare to alternatives, including lambda architecture. After working on a data pipeline using BigTable, DataFlow, and BigQuery, students will learn about what it takes to create data pipelines for machine learning and recommendation systems. The course concludes with covering the importance of reproducibility when creating training, evaluation, and test data and then will use TensorFlow together with Apache Beam for feature engineering of both structured and unstructured data.
Before attending this course, students should take the Google Cloud for Data Engineers course or be familiar with all of the topics listed here: Google Cloud for Data Engineers
Purpose
|
Learn how to use data engineering on Google Cloud to build specialized data pipelines for large scale streaming data analytics, machine learning, and recommendation systems. |
Audience
|
Data Engineers who want to build specialized types of data pipelines, including those for machine learning, streaming data analytics, and recommendation systems. |
Role
| Data Engineer - Software Developer - Technical Manager |
Skill Level
| Intermediate |
Style
| Fast Track - Targeted Topic - Workshops |
Duration
| 3 Days |
Related Technologies
| Apache Spark | Google Cloud | Tensorflow | Apache |
Productivity Objectives
- Construct data processing pipelines for streaming data analysis and machine learning.
- Create high performance, internet-scale, low-latency data stores with BigTable.
- Develop data pipelines to support machine learning model training and serving.
- Employ TensorFlow, DataFlow, and BigQuery for unstructured and structured data pipelines.
- Design and propose scenarios for large scale data migrations to Google Cloud.