Advanced Spark

The Advanced Spark training course provides a deeper dive into Spark. Information on internals as well as debugging/troubleshooting Spark applications are a central focus. Also covered is integration with other storage like Cassandra/HBase and other NoSQL implementations.

Course Summary

Learn Spark internals for working with NoSQL databases as well debugging and troubleshooting.
Developers who have taken the introduction to Spark or who have equivalent experience.
Skill Level: 
Learning Style: 

Hands-on training is customized, instructor-led training with an in-depth presentation of a technology and its concepts, featuring such topics as Java, OOAD, and Open Source.

Hands On help
4 Days
Productivity Objectives: 
  • Building on the Spark fundamentals gain a deeper understanding of Spark internals
  • Learn the operational tweaks to gain the maximum performance from Spark
  • Gain understanding how to use GraphX and MLib for machine learning
Advanced Spark is part of the Apache Training curriculum.

What You'll Learn

In the Advanced Spark training course you’ll learn:

  • Review of core Apache Spark concepts
    • How Spark works
    • RDD Fundamentals
    • SparkSQL and DataFrames
    • Spark Streaming concepts
    • Machine Learning basics
  • Understanding Spark Internals for Performance
    • Scheduling, jobs, and tasks
    • Data structures, data sets and data lakes
    • Shuffle and performance
    • Understanding data sources and partitions
    • Read, writes and performance
  • New Features of Spark 2
    • API Stability
    • Core and Spark SQL changes
    • Changes to packaging and operations
  • Working with Spark
    • Debugging/troubleshooting Spark apps
    • Developing data workflows
    • Automated Spark builds using Maven
  • Clustering with Spark
    • Running a spark cluster
    • Cluster resource requirements
    • Managing Memory on Executors/Worker
    • Managing memory/cores across a spark cluster
    • performance tuning
    • Best practices
  • Spark Integration
    • Implementing Spark on DataStax, Hortonworks etc.
    • Integration with Cassandra
    • Integration with Kafka
    • Integration with Elassticsearch
    • Integration with other compatible NoSQL implementations (as desired)
  • Machine Learning with Spark
    • Common algorithms
    • Commonly used algorithms with Scala
    • Machine learning libraries: MLLib, H20
    • Writing custom algorithms
  • Advanced Spark SQL and Spark Streaming
    • Leveraging Spark 2 API (Spark Session etc)
    • Developing with Spark Dataframes
    • Writing sollid spark jobs
    • When to use spark and when to not use spark
  • High Performance Spark applications
    • Performance tuning process
    • Performance tuning metrics
    • SQL performance tuning
    • High performant caching strategies
    • Cluster resource requirements
    • Creating fault-tolerance
  • Best Practices and Q/A

Meet Your Instructor

Sujee Picture

Sujee has been developing software for 15 years. In the last few years he has been consulting and teaching Hadoop, NOSQL and Cloud technologies.
Sujee stays active in Hadoop / Open Source community. He runs a developer focused meetup and Hadoop hackathons called ‘Big Data Gurus’. He has presented at variety of meetups.
Sujee contributes to Hadoop project and other open source projects. He writes about Hadoop and other technologies...

Meet Sujee »
Photo of Instructor
Andrew S

Andrew is a mathematician turned software engineer who loves building systems. After graduating with a PhD in pure math, he became fascinated by software startups and has since spent 20 years learning. During this period, he’s worked on a wide variety of projects and platforms, including big data analytics, enterprise optimization, mathematical finance, cross-platform middleware, and medical imaging.

In 2001, Andrew served as company architect at ProfitLogic, a pricing optimization startup...

Meet Andrew S »
Photo of Instructor
Jeff Newburn

Jeff is a software development veteran with over over 15 years of experience writing software in a variety of different languages.

After years of exploring various languages including PHP, Java, and Python, he created Zappos’ first Tech University charged with tech education of technical staff. During this time he also developed the main training program to bring the department into the Amazon fold as a full-fledged dev shop on their tools and systems.


Meet Jeff Newburn »

Get Custom Training Quote

We'll work with you to design a custom Advanced Spark training program that meets your specific needs. A 100% guaranteed plan that works for you, your team, and your budget.

Learn More

Chat with one of our Program Managers from our Boulder, Colorado office to discuss various training options.

DevelopIntelligence has been in the technical/software development learning and training industry for nearly 20 years. We’ve provided learning solutions to more than 48,000 engineers, across 220 organizations worldwide.

About Develop Intelligence
Di Clients
Need help finding the right learning solution?   Call us: 877-629-5631