Skip to content

GPU Architecture and CUDA

  • GPU vs CPU: throughput-oriented design, thousands of cores, SIMT execution model
  • GPU memory hierarchy: global memory, shared memory, registers, L1/L2 cache, constant memory
  • CUDA programming model: grids, blocks, threads, warps (32 threads), warp divergence
  • Kernel launch: grid/block dimensions, occupancy, register usage
  • Memory access patterns: coalesced access, bank conflicts in shared memory, memory fences
  • Synchronisation: __syncthreads, atomic operations, cooperative groups
  • Streams and concurrency: overlapping compute and data transfer, multi-stream execution
  • Profiling: nsight compute, nsight systems, occupancy calculator
  • NVIDIA GPU generations: Volta (tensor cores), Ampere (TF32, sparsity), Hopper (transformer engine, FP8), Blackwell