Skip to content

Contact sales

By filling out this form and clicking submit, you acknowledge our privacy policy.

Introduction to High-Performance GPU Architectures

Course Summary

The Introduction to High-Performance GPU Architectures training course introduces the programming techniques required to develop general purpose software applications for GPU hardware.

The course begins by examining the programming models of both OpenCL and NVIDIA's CUDA development framework. Next, students will learn how GPU hardware architectures differ from traditional CPU architectures and the changes in the programming environment (development, debugging, and validation). The course concludes with students learning about optimization strategies.

The DevelopIntelligence remote lab environment utilizes Nvidia hardware (Nvidia GTX480 and Tesla C2070) to illustrate CUDA/OpenCL concepts and to allow training participants to experimentally investigate performance issues, debugging techniques, and code examples.

Purpose
Learn about CUDA programming, profiling, and debugging techniques required to develop general purpose software applications for GPU hardware.
Audience
Software developers who need to implement high performance applications (e.g. numerical computing areas: finance and engineering).
Role
Software Developer
Skill Level
Intermediate
Style
Hack-a-thon - Learning Spikes - Workshops
Duration
1 Day
Related Technologies
CUDA

 

Productivity Objectives
  • Describe programming models of both OpenCL and NVIDIA's CUDA development framework.
  • Explore how GPU hardware architectures differ from traditional CPU architectures.
  • Evaluate CUDA programing, profiling and debugging techniques.

What You'll Learn:

In the Introduction to High-Performance GPU Architectures training course, you'll learn:
  • Open CL/CUDA Programming Model
  • Stream Computing and SIMD Platforms
  • Threads and Thread Hierarchy
  • Memory Hierarchy
  • Synchronisation
  • Host and Device Interactions
  • GPU Device Architecture
  • Streaming Multiprocessors and Scalar Processors
  • On-chip Memory Registers and Local Shared Memory
  • Execution Model Warps Scheduling and Divergence
  • Device Memory and Latency
  • Performance Tuning and Optimization
  • Instruction Performance
  • Memory Access Patterns
  • Global Memory Coalescence
  • Local Memory Bank Conflicts
  • Optimization Strategies
“I appreciated the instructor's technique of writing live code examples rather than using fixed slide decks to present the material.”

VMware

Dive in and learn more

When transforming your workforce, it's important to have expert advice and tailored solutions. We can help. Tell us your unique needs and we'll explore ways to address them.

Let's chat

By filling out this form and clicking submit, you acknowledge our privacy policy.