Skip to content

x86 and AVX

  • x86 SIMD evolution: MMX → SSE → SSE2/3/4 → AVX → AVX2 → AVX-512 → AMX
  • AVX/AVX2 programming: 256-bit YMM registers, intrinsics (mm256*), FMA instructions
  • AVX-512: 512-bit ZMM registers, mask registers, gather/scatter, conflict detection
  • Intel AMX: tile registers, TMUL (tile matrix multiply), BF16/INT8 acceleration
  • Memory alignment: aligned vs unaligned loads, cache line considerations
  • Performance pitfalls: AVX frequency throttling, register pressure, lane crossing penalties
  • Benchmarking and profiling: RDTSC, perf, VTune, likwid