Skip to content

Contact sales

By filling out this form and clicking submit, you acknowledge our privacy policy.

Introduction to Apache Spark

Course Summary

The Introduction to Apache Spark training course is designed to demonstrate the necessary skills to work with Apache Spark, an open-source engine for data in the Hadoop ecosystem optimized for speed and advanced analytics.

The course begins by examining how to use Spark as an alternative to traditional MapReduce processing. Next, it explores how Spark supports streamed data processing and iterative algorithms. The course concludes with a lesson on how Spark enables jobs to run faster than traditional Hadoop MapReduce.

Purpose
Learn how to use Apache Spark as an alternative to traditional MapReduce processing.
Audience
Developers working on projects that use traditional Hadoop MapReduce.
Role
Software Developer
Skill Level
Introduction
Style
Hack-a-thon - Learning Spikes - Workshops
Duration
3 Days
Related Technologies
Apache Spark | Hadoop | Apache

 

Productivity Objectives
  • Describe how Apache Spark and Hadoop fit together
  • List three motivations for using Spark
  • Understand Resilient Distributed Datasets (RDDs)
  • Implement an application using the key Spark concepts

What You'll Learn:

In the Introduction to Apache Spark training course, you'll learn:
  • Spark Basics
    • What is Apache Spark?
    • Using the Spark Shell
    • Resilient Distributed Datasets (RDDs)
    • Functional Programming with Spark
  • The Hadoop Distributed File (HDFS) System
    • Why HDFS?
    • HDFS Architecture
    • Using HDFS
  • Spark and Hadoop
    • Spark and the Hadoop Ecosystem
    • Spark and MapReduce
  • RDDs
    • RDD Operations
    • KeyValue Pair RDDs
    • MapReduce and Pair RDD Operations
  • Running Spark on a Cluster
    • Standalone Cluster
    • The Spark Standalone Web UI
  • Parallel Programming with Spark
    • RDD Partitions and HDFS Data Locality
    • Working With Partitions
    • Executing Parallel Operations
  • Caching and Persistence
    • Distributed Persistence
    • Caches
  • Writing Spark Applications
    • SparkContext
    • Spark Properties
    • Building and Running a Spark Application
    • Logging
  • Spark Streaming
    • Streaming Overview
    • Sliding Window Operations
    • Spark Streaming Applications
“I appreciated the instructor's technique of writing live code examples rather than using fixed slide decks to present the material.”

VMware

Dive in and learn more

When transforming your workforce, it's important to have expert advice and tailored solutions. We can help. Tell us your unique needs and we'll explore ways to address them.

Let's chat

By filling out this form and clicking submit, you acknowledge our privacy policy.