Skip to content

ARM and NEON

  • ARM architecture: load-store ISA, register file, condition codes, Thumb mode
  • ARM NEON: 128-bit SIMD, data types (int8, int16, float16, float32), register layout
  • NEON intrinsics: load/store (vld1, vst1), arithmetic (vadd, vmul, vmla), shuffle and permute
  • SVE and SVE2: scalable vector extensions, predicate registers, vector-length agnostic programming
  • Apple Silicon specifics: AMX (Apple Matrix eXtensions), performance cores vs efficiency cores
  • Practical examples: vectorised dot product, matrix multiply, image processing kernels
  • Auto-vectorisation: compiler flags, pragmas, loop patterns that help/hinder vectorisation