The Google Cloud for Data Engineers training course teaches students the fundamentals of Google Cloud Platform (GCP) for building and running data pipelines that process batch or streaming data.
This course starts by learning about the GCP services most frequently used by data engineers. Next, students will advance their existing skills in SQL, Hadoop, and Python by understanding how to reuse existing applications to take advantage of managed MySQL and Hadoop/Spark infrastructure on GCP. Most of the course focuses on differentiating capabilities of GCP for data engineering. Next, students will learn how to process, analyze, and store petabytes of batch and streaming data with serverless capabilities like PubSub, BigQuery, and Dataflow. For example, students will work with Apache Beam code that enables going beyond the limitations of the original MapReduce framework. The course concludes with students being introduced to the Machine Learning capabilities of GCP that data engineers can start using without having prior data science experience.
The course will also provide architectural overviews of data processing pipelines enabled by GCP and how you choose the right GCP services for your project.
Purpose
|
Learn to build systems on Google Cloud to store and process batch or streaming data. |
Audience
|
Developers and developer teams looking to use streaming data on GCP. |
Role
| Data Engineer - Software Developer |
Skill Level
| Introduction |
Style
| Fast Track - Hack-a-thon - Learning Spikes - Workshops |
Duration
| 3 Days |
Related Technologies
| BigQuery | Hadoop | Google Cloud | Python | MySQL | Apache |
Productivity Objectives
- Describe the capabilities of Google Cloud for data engineering.
- Build and run data processing pipelines on GCP to ingest, analyze, and store data.
- Identify how to use managed Google Cloud infrastructure for MySQL and Hadoop/Spark.
- Discuss when and how to use PubSub, DataFlow, and BigQuery for serverless data pipelines.
- Integrate data pipelines with other GCP services.
- Identify what criteria to use for the design of data processing pipelines on GCP.