Hardware Fundamentals
- Moore's law and the end of frequency scaling: why parallelism matters
- CPU architecture recap: superscalar execution, out-of-order execution, branch prediction, speculative execution
- SIMD concept: single instruction, multiple data, data-level parallelism
- Vector registers and vector width: 128-bit, 256-bit, 512-bit
- Memory bandwidth vs compute: roofline model, arithmetic intensity
- Latency vs throughput: pipelining, instruction-level parallelism
- Chip families overview: x86 (Intel, AMD), ARM, RISC-V, Apple Silicon
- Thermal and power constraints: TDP, power efficiency, dark silicon