Introduction to Apache Spark

The Introduction to Apache Spark training course teaches developers the skills needed to work with Apache Spark, an open-source engine for data in the Hadoop ecosystem optimized for speed and advanced analytics.

During this course you learn how to use Spark as an alternative to traditional MapReduce processing, explore how Spark supports streamed data processing and iterative algorithms, and enables jobs to run from 10x to 100x faster than traditional Hadoop MapReduce.

Course Summary

Purpose: 
Learn how to use Apache Spark as an alternative to traditional MapReduce processing.
Audience: 
Developers working on projects that use traditional Hadoop MapReduce.
Skill Level: 
Learning Style: 

Hands-on training is customized, instructor-led training with an in-depth presentation of a technology and its concepts, featuring such topics as Java, OOAD, and Open Source.

Hands On help
Duration: 
3 Days
Productivity Objectives: 
  • Describe how Apache Spark and Hadoop fit together
  • List 3 motivations for using Spark
  • Describe and understand RDDs
  • Implement an application using the key Spark concepts
Introduction to Apache Spark is part of the Apache Training curriculum.

What You'll Learn

In the Introduction to Apache Spark training course, you’ll learn:

  • Spark Basics
    • What is Apache Spark?
    • Using the Spark Shell
    • Resilient Distributed Datasets (RDDs)
    • Functional Programming with Spark
  • The Hadoop Distributed File System
    • Why HDFS?
    • HDFS Architecture
    • Using HDFS
  • Spark and Hadoop
    • Spark and the Hadoop Ecosystem
    • Spark and MapReduce
  • RDDs
    • RDD Operations
    • KeyValue Pair RDDs
    • MapReduce and Pair RDD Operations
  • Running Spark on a Cluster
    • Standalone Cluster
    • The Spark Standalone Web UI
  • Parallel Programming with Spark
    • RDD Partitions and HDFS Data Locality
    • Working With Partitions
    • Executing Parallel Operations
  • Caching and Persistence
    • Distributed Persistence
    • Caching
  • Writing Spark Applications
    • SparkContext
    • Spark Properties
    • Building and Running a Spark Application
    • Logging
  • Spark Streaming
    • Streaming Overview
    • Sliding Window Operations
    • Spark Streaming Applications

Meet Your Instructor

Michael headshot
Michael

Michael is a practicing software developer, course developer, and trainer with DevelopIntelligence. For the majority of his career, Michael has designed and implemented large-scale, enterprise-grade, Java-based applications at major telecommunications and Internet companies, such as Level3 Communications, US West/Qwest/Century Link, Orbitz, and others.

Michael has a passion for learning new technologies, patterns, and paradigms (or, he has a tendency to get bored or disappointed with current ones)....

Meet Michael »
Sujee Picture
Sujee

Sujee has been developing software for 15 years. In the last few years he has been consulting and teaching Hadoop, NOSQL and Cloud technologies. Sujee stays active in Hadoop / Open Source community. He runs a developer focused meetup and Hadoop hackathons called ‘Big Data Gurus’. He has presented at variety of meetups. Sujee contributes to Hadoop project and other open source projects. He writes about Hadoop and other technologies on his website.

Meet Sujee »
Photo of Instructor
Andrew S

Andrew is a mathematician turned software engineer who loves building systems. After graduating with a PhD in pure math, he became fascinated by software startups and has since spent 20 years learning. During this period, he’s worked on a wide variety of projects and platforms, including big data analytics, enterprise optimization, mathematical finance, cross-platform middleware, and medical imaging.

In 2001, Andrew served as company architect at ProfitLogic, a pricing optimization startup...

Meet Andrew S »

Contact us to learn more

Not all training courses are created equal. Let the customization process begin! We'll work with you to design a custom Introduction to Apache Spark training course that meets your specific needs.

DevelopIntelligence has been in the technical/software development learning and training industry for nearly 20 years. We’ve provided learning solutions to more than 48,000 engineers, across 220 organizations worldwide.

About Develop Intelligence
Di Clients

surveyask

Need help finding the right learning solution?   Call us: 877-629-5631