Is 3‑Bit KV Cache the Ultimate Solution? An In‑Depth Evaluation of Google’s TurboQuant
Through ten experiments on three LLMs, this study measures TurboQuant’s 3‑bit KV‑cache compression, revealing that while quality remains strong, speed gains vary by model, memory savings depend on implementation, and attention‑entropy analysis explains why 2‑bit compression degrades performance.
