Introduction to Apache Spark

The Introduction to Apache Spark training course teaches developers the skills needed to work with Apache Spark, an open-source engine for data in the Hadoop ecosystem optimized for speed and advanced analytics.

During this course you learn how to use Spark as an alternative to traditional MapReduce processing, explore how Spark supports streamed data processing and iterative algorithms, and enables jobs to run from 10x to 100x faster than traditional Hadoop MapReduce.

Course Summary

Learn how to use Apache Spark as an alternative to traditional MapReduce processing.
Developers working on projects that use traditional Hadoop MapReduce.
Skill Level: 
Learning Style: 

Hands-on training is customized, instructor-led training with an in-depth presentation of a technology and its concepts, featuring such topics as Java, OOAD, and Open Source.

Hands On help
3 Days
Productivity Objectives: 
  • Describe how Apache Spark and Hadoop fit together
  • List 3 motivations for using Spark
  • Describe and understand RDDs
  • Implement an application using the key Spark concepts
Introduction to Apache Spark is part of the Apache Training curriculum.

What You'll Learn

In the Introduction to Apache Spark training course, you’ll learn:

  • Spark Basics
    • What is Apache Spark?
    • Using the Spark Shell
    • Resilient Distributed Datasets (RDDs)
    • Functional Programming with Spark
  • The Hadoop Distributed File System
    • Why HDFS?
    • HDFS Architecture
    • Using HDFS
  • Spark and Hadoop
    • Spark and the Hadoop Ecosystem
    • Spark and MapReduce
  • RDDs
    • RDD Operations
    • KeyValue Pair RDDs
    • MapReduce and Pair RDD Operations
  • Running Spark on a Cluster
    • Standalone Cluster
    • The Spark Standalone Web UI
  • Parallel Programming with Spark
    • RDD Partitions and HDFS Data Locality
    • Working With Partitions
    • Executing Parallel Operations
  • Caching and Persistence
    • Distributed Persistence
    • Caching
  • Writing Spark Applications
    • SparkContext
    • Spark Properties
    • Building and Running a Spark Application
    • Logging
  • Spark Streaming
    • Streaming Overview
    • Sliding Window Operations
    • Spark Streaming Applications

Meet Your Instructor

Michael headshot

Michael is a practicing software developer, course developer, and trainer with DevelopIntelligence. For the majority of his career, Michael has designed and implemented large-scale, enterprise-grade, Java-based applications at major telecommunications and Internet companies, such as Level3 Communications, US West/Qwest/Century Link, Orbitz, and others.

Michael has a passion for learning new technologies, patterns, and paradigms (or, he has a tendency to get bored or disappointed with current ones)....

Meet Michael »
Sujee Picture

Sujee has been developing software for 15 years. In the last few years he has been consulting and teaching Hadoop, NOSQL and Cloud technologies.
Sujee stays active in Hadoop / Open Source community. He runs a developer focused meetup and Hadoop hackathons called ‘Big Data Gurus’. He has presented at variety of meetups.
Sujee contributes to Hadoop project and other open source projects. He writes about Hadoop and other technologies...

Meet Sujee »
Photo of Instructor
Andrew S

Andrew is a mathematician turned software engineer who loves building systems. After graduating with a PhD in pure math, he became fascinated by software startups and has since spent 20 years learning. During this period, he’s worked on a wide variety of projects and platforms, including big data analytics, enterprise optimization, mathematical finance, cross-platform middleware, and medical imaging.

In 2001, Andrew served as company architect at ProfitLogic, a pricing optimization startup...

Meet Andrew S »
Photo of Instructor
Jeff Newburn

Jeff is a software development veteran with over over 15 years of experience writing software in a variety of different languages.

After years of exploring various languages including PHP, Java, and Python, he created Zappos’ first Tech University charged with tech education of technical staff. During this time he also developed the main training program to bring the department into the Amazon fold as a full-fledged dev shop on their tools and systems.


Meet Jeff Newburn »

Get Custom Training Quote

We'll work with you to design a custom Introduction to Apache Spark training program that meets your specific needs. A 100% guaranteed plan that works for you, your team, and your budget.

Learn More

Chat with one of our Program Managers from our Boulder, Colorado office to discuss various training options.

DevelopIntelligence has been in the technical/software development learning and training industry for nearly 20 years. We’ve provided learning solutions to more than 48,000 engineers, across 220 organizations worldwide.

About Develop Intelligence
Di Clients
Need help finding the right learning solution?   Call us: 877-629-5631