parallel computing with CUDA

The Parallel Computing with CUDA training course covers on how to get started with using the CUDA platform and leverage the power of modern NVIDIA GPUs. It covers the basics of CUDA C, explains the architecture of the GPU and presents solutions to some of the common computational problems that are suitable for GPU acceleration.

COURSE AGENDA

  • Overview
  • Compilation Process
  • Hello, CUDA
  • Location Qualifiers
  • Execution Model
  • Grid and Block Dimensions
  • Error Handling
  • Device Introspection
  • Tools Overview
  • Using NSight
  • Running CUDA Apps
  • Debugging
  • Profiling
  • History of GPU Computation
  • GPGPU Frameworks
  • Graphics Processor Architecture
  • Compute Capability
  • Choosing a Graphics Card
  • Overview
  • Element Addressing
  • Map
  • Gather
  • Scatter
  • Reduce
  • Scan
  • Overview
  • Global Memory
  • Constant & Texture Memory
  • Shared Memory
  • Register & Local Memory
  • Overview
  • Barrier Synchronization
  • Thread Synchronization Demo
  • Warp Divergence
  • Overview
  • Why Atomics?
  • Atomic Functions
  • Atomic Sum
  • Monte Carlo Pi
  • Overview
  • Events
  • Event API
  • Event example
  • Pinned memory
  • Streams
  • Stream API
  • Example (single stream)
  • Example (multiple streams)
  • Overview
  • Inline PTX
  • Device API
  • Pinned Memory
  • Multi-GPU Programming
  • Thrust
  • Summary