Can MIT’s Attention Matching Cut LLM Memory 50× Without Accuracy Loss?
MIT researchers introduce Attention Matching, a latent‑space KV‑cache compaction technique that reduces large‑language‑model memory usage up to 50‑fold with negligible precision loss, outperforming token‑pruning, summarization, and prior compaction methods across benchmarks like QuALITY, LongHealth, and AIME‑2025.
