Why NUMA Slows Multithreaded Apps and How to Optimize It
This article explains NUMA architecture, its multithreaded performance overheads such as remote memory access, cache synchronization, context and mode switches, interrupt handling, TLB misses, and memory copies, and then presents optimization techniques like NUMA and CPU affinity, IRQ tuning, and large‑page usage.
