x86 and AVX
- x86 SIMD evolution: MMX → SSE → SSE2/3/4 → AVX → AVX2 → AVX-512 → AMX
- AVX/AVX2 programming: 256-bit YMM registers, intrinsics (mm256*), FMA instructions
- AVX-512: 512-bit ZMM registers, mask registers, gather/scatter, conflict detection
- Intel AMX: tile registers, TMUL (tile matrix multiply), BF16/INT8 acceleration
- Memory alignment: aligned vs unaligned loads, cache line considerations
- Performance pitfalls: AVX frequency throttling, register pressure, lane crossing penalties
- Benchmarking and profiling: RDTSC, perf, VTune, likwid