Intermediate Google Cloud For Data Analysts

The Intermediate Google Cloud For Data Analysts training course is designed to advance the skills and knowledge of students already familiar data analysis using Google Cloud to the more advanced features and functionality, including predictive, transactional, and large scale distributed data analytics.

This course will start by using the MapReduce-based, batch data analysis tools available as managed infrastructure services on Google Cloud, including Apache Hive, Apache Pig, and PySpark. Next, students will use Apache Beam to analyze both batch and streaming data using a single data pipeline. The course will conclude by preparing students to perform data analysis operations commonly used in predictive analytics and machine learning, including feature creation and feature pre-processing.

This course targets students who have either taken the Introduction to Google Cloud for Data Analysts course or have equivalent knowledge/experience. The course will be conducted on Google Cloud Platform. Students will need a reasonably powerful laptop running an up-to-date browser (preferably Chrome). Make sure that the laptop is well charged in advance!

Course Summary

Purpose: 
Learn how to analyze large scale, distributed, and real-time datasets with MapReduce and Apache Beam based capabilities of Google Cloud and practice identification and analysis of effective data features for predictive analytics with BigQuery ML and TensorFlow.
Audience: 
Developers and developer teams looking to dive deeper into the data science capabilities of the Google Cloud Platform.
Skill Level: 
Learning Style: 

Hands-on training is customized, instructor-led training with an in-depth presentation of a technology and its concepts, featuring such topics as Java, OOAD, and Open Source.

Hands On help

Seminars are highly-focused, lecture-heavy, half-day to multi-day learning events. Seminars are a great way to create an awareness level of knowledge for a large number of concepts, in a short period of time. Think wide (breadth) and thin (depth).

Seminar help

Workshops are instructor-led lab-intensives focused on the practical application of technologies through the facilitation of a project-related lab. Workshops are just the opposite of Seminars. They deliver the highest level of knowledge transfer of any format. Think wide (breadth) and deep (depth).

Workshop help
Duration: 
2 Days
Productivity Objectives: 
  • Use DataProc to perform MapReduce based data analysis
  • Integrate transactional data from a Cloud SQL database in data analysis
  • Use Apache Beam based data analysis pipelines for batch and streaming data
  • Support data science and machine learning through analysis of effective data features
  • Use Google Colab and Jupyter notebooks for Python based data analysis

What You'll Learn

In the Intermediate Google Cloud For Data Analysts training course you’ll learn:

  • MapReduce for Data Analysts
    • Map vs. FlatMap for MapReduce
    • Running Apache Hive, Apache Pig, and PySpark
    • Provisioning Managed Apache Hadoop/Spark/YARN Infrastructure
    • Pre-Emptible Instances for MapReduce
    • Dataproc User Interface
    • Running and Monitoring MapReduce Jobs
  • Cloud SQL
    • Transactional Data for Analysis
    • Provisioning Managed Database Infrastructure
    • Configuration of MySQL on GCP
    • Batch Data Import/Export with Cloud SQL
    • Web-based Interface
    • Integration of MySQL with GCP Services and Applications
    • Recommendation Systems with Cloud SQL
  • Apache Beam for Data Analysts
    • Batch and Streaming Data Processing Pipelines
    • Run Apache Beam in Cloud Shell
    • Apache Beam Combine vs. GroupBy
    • Submitting Apache Beam Pipelines
    • Running Batch and Streaming Dataflow Jobs
    • Apache Beam Pipelines with Side-Inputs
    • Autoscaling Streaming Apache Beam Jobs
    • Apache Beam Windows and Triggers
    • Web-based and Command Line Interface
    • Monitoring Dataflow Jobs
  • Jupyter for Data Analysis
    • Google Colab
    • BigQuery from Colab
    • Pandas DataFrames and Series
    • GroupBy and Pivot Table
    • Data visualization with Seaborn
    • Predictive Analytics with TensorFlow
  • Data Analytics for Data Science
    • Five Criteria for Effective Data Features
    • Feature Engineering Case Studies and Best Practices
    • Feature Crosses, Quantization, One-hot Encoding
    • Feature Creation and Pre-processing in a Machine Learning Pipeline
    • Feature Engineering for Wide-and-Deep Models

Get Custom Training Quote

We'll work with you to design a custom Intermediate Google Cloud For Data Analysts training program that meets your specific needs. A 100% guaranteed plan that works for you, your team, and your budget.

Learn More

Chat with one of our Program Managers from our Boulder, Colorado office to discuss various training options.

DevelopIntelligence has been in the technical/software development learning and training industry for nearly 20 years. We’ve provided learning solutions to more than 48,000 engineers, across 220 organizations worldwide.

About Develop Intelligence
Di Clients
Need help finding the right learning solution?   Call us: 877-629-5631