Kuaishou Tech
May 14, 2026 · Artificial Intelligence
Open‑Source Kwai Summary Attention (KSA): A Sequence‑Compression Mechanism for Long‑Context Inference
KSA inserts learnable summary tokens to compress KV cache by a factor of eight, enabling accurate long‑context retrieval with far lower memory and compute costs, and it consistently outperforms full‑attention and other hybrid methods on large‑scale benchmarks.
Efficient InferenceKSAKV cache reduction
0 likes · 13 min read
