Skip to content

Triton, TPUs, and Vulkan

  • Triton: Python-based GPU kernel programming, block-level abstraction, auto-tuning
  • Writing Triton kernels: tl.load, tl.store, tl.dot, masking, grid/block programs
  • Triton vs CUDA: productivity vs control tradeoff, when to use each
  • Flash Attention as a case study: memory-efficient attention via tiling, online softmax
  • TPU architecture: systolic arrays, MXU (matrix multiply unit), HBM, ICI interconnect
  • TPU programming: XLA compiler, JAX/pjit, GSPMD, sharding annotations
  • Vulkan compute: compute shaders, SPIR-V, descriptor sets, command buffers
  • Comparison: GPU (CUDA/Triton) vs TPU (JAX/XLA) vs Vulkan, choosing the right tool