The Introduction to Apache Spark training course teaches developers the skills needed to work with Apache Spark, an open-source engine for data in the Hadoop ecosystem optimized for speed and advanced analytics.
During this course you learn how to use Spark as an alternative to traditional MapReduce processing, explore how Spark supports streamed data processing and iterative algorithms, and enables jobs to run from 10x to 100x faster than traditional Hadoop MapReduce.
- Describe how Apache Spark and Hadoop fit together
- List 3 motivations for using Spark
- Describe and understand RDDs
- Implement an application using the key Spark concepts
What You'll Learn
In the Introduction to Apache Spark training course, you’ll learn:
- Spark Basics
- What is Apache Spark?
- Using the Spark Shell
- Resilient Distributed Datasets (RDDs)
- Functional Programming with Spark
- The Hadoop Distributed File System
- Why HDFS?
- HDFS Architecture
- Using HDFS
- Spark and Hadoop
- Spark and the Hadoop Ecosystem
- Spark and MapReduce
- RDD Operations
- KeyValue Pair RDDs
- MapReduce and Pair RDD Operations
- Running Spark on a Cluster
- Standalone Cluster
- The Spark Standalone Web UI
- Parallel Programming with Spark
- RDD Partitions and HDFS Data Locality
- Working With Partitions
- Executing Parallel Operations
- Caching and Persistence
- Distributed Persistence
- Writing Spark Applications
- Spark Properties
- Building and Running a Spark Application
- Spark Streaming
- Streaming Overview
- Sliding Window Operations
- Spark Streaming Applications
Meet Your Instructor
Michael is a practicing software developer, course developer, and trainer with DevelopIntelligence. For the majority of his career, Michael has designed and implemented large-scale, enterprise-grade, Java-based applications at major telecommunications and Internet companies, such as Level3 Communications, US West/Qwest/Century Link, Orbitz, and others.
Michael has a passion for learning new technologies, patterns, and paradigms (or, he has a tendency to get bored or disappointed with current ones)....Sujee
Sujee has been developing software for 15 years. In the last few years he has been consulting and teaching Hadoop, NOSQL and Cloud technologies. Sujee stays active in Hadoop / Open Source community. He runs a developer focused meetup and Hadoop hackathons called ‘Big Data Gurus’. He has presented at variety of meetups. Sujee contributes to Hadoop project and other open source projects. He writes about Hadoop and other technologies on his website.Andrew S
Andrew is a mathematician turned software engineer who loves building systems. After graduating with a PhD in pure math, he became fascinated by software startups and has since spent 20 years learning. During this period, he’s worked on a wide variety of projects and platforms, including big data analytics, enterprise optimization, mathematical finance, cross-platform middleware, and medical imaging.
In 2001, Andrew served as company architect at ProfitLogic, a pricing optimization startup...