Scalable Machine Learning

Course Summary

The Scalable Machine Learning (SML) course is designed and developed to provide students with exposure in Scalable Machine learning. The course focuses on utilizing the Hadoop and Spark Frameworks to implement SML Algorithms via Scala and Python programming languages.

The course begins with an introduction to SML and why developers use Spark for SML Next, the course dives into data acquisition, data pre-processing for modeling, and working with Iterative algorithms. The course concludes with model evaluation, optimization and deployment.

Purpose	Learn about and build end-to-end SML pipelines for gaining actionable insights.
Audience	Teams needing to gracefully scale up their Machine Learning projects.
Role	Data Engineer - Data Scientist - Software Developer
Skill Level	Intermediate
Style	Hack-a-thon - Learning Spikes - Workshops
Duration	3 Days
Related Technologies	Apache Spark \| Hadoop \| Python

Productivity Objectives

Describe the role of Spark in Machine Learning.
Apply Machine learning on massive datasets.
Demonstrate experience in Data Acquisition, Processing, Analysis and Modeling using Hadoop and Spark.
Evaluate various common types of data e.g. CSV, XML, JSON, Social Media data, etc. for pre-processing and/or building Machine Learning Models using Spark.
Train, tune, test and deploy Machine Learning Models.

What You'll Learn:

In the Scalable Machine Learning training course, you'll learn:

Introduction to SML
- What is SML?
- Why it is required?
- Key platforms for performing SML
- SMLProject End to End Pipeline
- Spark Introduction
- Why Spark for SML?
- Databricks Platform Demo
- Approaches for scaling sci-kit learn code
- Hands-on Exercise(s): Experiencing the first notebook
Why Spark for SML?
- Problems with Traditional Machine Learning Frameworks
- Machine Learning at Scale - Various options
- Iterative Algorithms
- How Spark performs well for Iterative Machine Learning Algorithms?
- Hands-on Exercise(s)
SML on Enterprise Platform
- Quick Recap/Introduction to Hadoop
- Logical View of Cloudera Distribution
- Big Data Analytics Pipelines
- Components in Cloudera Distribution for performing SML
- Hands-on Exercise(s)
Data Acquisition at Scale
- Acquiring Structured content from Relational Databases
- Acquiring Semi-structured content from Log Files
- Acquiring Unstructured content from other key sources like Web
- Tools for Performing Data acquisition at Scale
- Sqoop, Flume and Kafka Introduction, use cases and architectures
- Hands-on Exercise(s)
Data Pre-Processing for Modeling
- Using the Spark Shell
- Resilient Distributed Datasets (RDDs)
- Functional Programming with Spark
- RDD Operations
- Key-Value Pair RDDs
- MapReduce and Pair RDD Operations
- Building and Running a Spark Application
- Performing Data Validation
- Data De-Duplication
- Detecting Outliers
- Hands-on Exercise(s)
Working with Iterative Algorithms
- Dealing with RDD Infinite Lineages
- Caching Overview
- Distributed Persistence
- Checkpointing of an Iterative Machine Learning Algorithm
- Hands-on Exercise(s)
Spark SQL
- Introduction
- Dataframe API
- Performing ad-hoc query analysis using Spark SQL
- Hands-on Exercise(s)
Spark Machine Learning Using MLLib
- Spark ML vs Spark MLLib
- Data types and key terms
- Feature Extraction
- Linear Regression using Spark MLLib
- Hands-on Exercise(s)
Spark Machine Learning Using ML
- Spark ML Overview
- Transformers and Estimators
- Pipelines
- Implementing Decision Trees
- K-Means Clustering using Spark ML
- Hands-on Exercise(s)
Decision Trees and Random Forest
- Types - Classification and Regression trees
- Gini Index, Entropy and Information Gain
- Building Decision Trees
- Pruning the trees
- Prediction using Trees
- Ensemble Models
- Bagging and Boosting
- Advantages of using Random Forest
- Working with Random Forest
- Ensemble Learning
- How ensemble learning works
- Building models using Bagging
- Random Forest algorithm
- Random Forest model building
- Fine tuning hyper-parameters
- Hands-on Exercise(s)
Model Evaluation, Optimization and Deployment
- Model Evaluation
- Optimizing a Model
- Deploying Model
- Best Practices

Real-World Content

Project-focused demos and labs using your tool stack and environment, not some canned "training room" lab.

Expert Practitioners

Industry experts that bring their battle scars into the classroom.

Experiential Learning

More coding than lecture, coupled with architectural and design discussions.

Tailored Outlines

One-size-fits-all doesn't apply to training teams. That's where we come in!

“I appreciated the instructor's technique of writing live code examples rather than using fixed slide decks to present the material.”

VMware

Dive in and learn more

When transforming your workforce, it's important to have expert advice and tailored solutions. We can help. Tell us your unique needs and we'll explore ways to address them.

Let's chat

First Name*

Last Name*

Business Email*

Company*

Job Title*

Phone*

Country*

Tell us about what you’re looking to accomplish:

By filling out this form and clicking submit, you acknowledge our privacy policy.

Scalable Machine Learning

Course Summary

Purpose

Audience

Role

Skill Level

Style

Duration

Related Technologies