This Spark Optimization training course is designed to cover advanced levels of Spark for tuning applications.
The course begins with a review of Spark including architecture, terms, and using Hadoop with Spark. From there, students will learn about the Spark execution environment and YARN; how to work with the right data format; and dealing with Spark partitions. The course concludes by exploring Spark physical execution, using the Spark Core API, caching and checkpointing, joins, and optimization.
The course is offered in Python/Scala programming languages.
Purpose
|
Learn best practices and techniques to optimize Spark Core and Spark SQL code. |
Audience
|
Engineers looking to up-skill their Spark knowledge. |
Role
| Data Engineer - Software Developer |
Skill Level
| Advanced |
Style
| Workshops |
Duration
| 3 Days |
Related Technologies
| Apache Spark | Hadoop |
Productivity Objectives
- Integrate aspects of Spark on YARN
- Deal with Binary Data Formats
- Identify the Internals of Spark
- Optimize Spark Core and Spark SQL Code
- Discuss best practices when writing Spark Core and Spark SQL Code