- Onboard For Tech Teams
- Reduce initial time to productivity.
- Increase employee tenure.
- Plug-and-play into HR onboarding and career pathing programs.
- Customize for ad-hoc and cohort-based hiring approaches.
- Upskill For Tech Teams
- Upgrade and round out developer skills.
- Tailor to tech stack and specific project.
- Help teams, business units, centers of excellence and corporate tech universities.
- Reskill For Tech Teams
- Offer bootcamps to give employees a running start.
- Create immersive and cadenced learning journeys with guaranteed results.
- Supplement limited in-house L&D resources with all-inclusive programs to meet specific business goals.
- Design For Tech Teams
- Uplevel your existing tech learning framework.
- Extend HR efforts to provide growth opportunities within the organization.
- Prepare your team for an upcoming tech transformation.
Get your team started on a custom learning journey today!
Our Boulder, CO-based learning experts are ready to help!
Course Summary
This Spark Optimization training course is designed to cover advanced levels of Spark for tuning applications.
The course begins with a review of Spark including architecture, terms, and using Hadoop with Spark. From there, students will learn about the Spark execution environment and YARN; how to work with the right data format; and dealing with Spark partitions. The course concludes by exploring Spark physical execution, using the Spark Core API, caching and checkpointing, joins, and optimization.
The course is offered in Python/Scala programming languages.
- Productivity Objectives:
- Integrate aspects of Spark on YARN
- Deal with Binary Data Formats
- Identify the Internals of Spark
- Optimize Spark Core and Spark SQL Code
- Discuss best practices when writing Spark Core and Spark SQL Code
Request Information
Get your team upskilled or reskilled today. Chat with one of our experts to create a custom training proposal. Fully customized at no additional cost.
If you are not completely satisfied with your training class, we'll give you your money back.
about our training
-
Real-World Content
Project-focused demos and labs using your tool stack and environment, not some canned "training room" lab.
-
Expert Practitioners
Industry experts with 15+ years of industry experience that bring their battle scars into the classroom.
-
Experiential Learning
More coding than lecture, coupled with architectural and design discussions.
-
Fully Customized
One-size-fits-all doesn't apply to training teams. That's where we come in!
What You'll Learn
In the Spark Optimization training course, you'll learn:
- Spark Overview
- Logical Architecture
- Physical Architecture of Spark
- Common Concepts and Terms in Spark
- Ways to build applications on Spark
- Spark with Hadoop
- Understanding Spark Execution Environment – YARN
- About YARN
- Why YARN
- Architecture of YARN
- YARN UI and Commands
- Internals of YARN
- Experience execution of Spark application on YARN
- Troubleshooting and Debugging Spark applications on YARN
- Optimizing Application Performance
- Working with Right Data Format
- Why Data Formats are important for optimization
- Key Data Formats
- Comparisons – which one to choose when?
- Working with Avro
- Working with Parquet
- Working with ORC
- Dealing with Spark Partitions
- How Spark determines number of Partitions
- Things to keep in mind while determining Partition
- Small Partitions Problem
- Diagnosing & Handling Post Filtering Issues (Skewness)
- Repartition vs Coalesce
- Spark Physical Execution
- Spark Core Plan
- Modes of Execution
- YARN Client vs YARN Cluster
- Standalone Mode
- Physical Execution on Cluster
- Narrow vs Wide Dependency
- Spark UI
- Executor Memory Architecture
- Key Properties
- Effective Development Using Spark Core API
- Use of groupbykey and reducebykey
- Using the right datatype in RDD
- How to ensure memory is utilized effectively?
- Performing Data Validation in an optimal manner
- Use of mapPartitions
- Partitioning Strategies
- Hash Partitioner
- Use of Range Partitioner
- Writing and plugging custom partitioner
- Caching and Checkpointing
- When to Cache?
- How Caching helps?
- Caching Strategies
- How Spark plans changes when Caching is on
- Caching on Spark UI
- Role of Alluxio
- Checkpointing
- How Caching is different from Checkpointing
- Joins
- Why optimizing joins is important
- Types of Joins
- Quick Recap of MapReduce MapSide Joins
- Broadcasting
- Bucketing
- Spark SQL Optimization
- Dataframes vs Datasets
- About Tungsten
- Data Partitioning
- Query Optimizer: Catalyst Optimizer
- Debugging Spark Queries
- Explain Plan
- Partitioning & Bucketing in Spark SQL
- Best Practices for writing Spark SQL code
- Spark SQL with Binary Data formats
Real-world content
Project-focused demos and labs using your tool stack and environment, not some canned "training room" lab.
Expert Practitioners
Industry experts that bring their battle scars into the classroom.
Experiential Learning
More coding than lecture, coupled with architectural and design discussions.
Fully Customized
One-size-fits-all doesn't apply to training teams. That's where we come in!
Elite Instructor Program
We recently launched our internal Elite Instructor Program. The community driven instructor program is designed to support instructors in transforming students’ lives by consistently showing a world-class level of engagement, ability, and teaching prowess. Reach out today to learn more about our instructors.
Customized Technical Learning Solutions to Help Attract and Retain Talented Developers
Let DI help you design solutions to onboard, upskill or reskill your software development organization. Fully customized. 100% guaranteed.
DevelopIntelligence leads technical and software development learning programs for Fortune 500 companies. We provide learning solutions for hundreds of thousands of engineers for over 250 global brands.
“I appreciated the instructor’s technique of writing live code examples rather than using fixed slide decks to present the material.”
VMwareAbout Us
LET’S DISCUSS
DevelopIntelligence has been in the technical/software development learning and training industry for nearly 20 years. We’ve provided learning solutions to more than 48,000 engineers, across 220 organizations worldwide.
Resources
Thank you for everyone who joined us this past year to hear about our proven methods of attracting and retaining tech talent.
- Boulder, Colorado Headquarters: 980 W. Dillon Road, Louisville, CO 80027
© 2013 - 2022 DevelopIntelligence LLC - Privacy Policy