Tagged articles
2 articles
Page 1 of 1
Data Party THU
Data Party THU
Apr 30, 2026 · Artificial Intelligence

Turning Transformers into Mamba: How Apple Linearized Inference Costs

Apple introduced a two‑step cross‑architecture distillation method that converts costly quadratic‑time Transformers into cheaper linear‑time Mamba models, preserving most of the original performance while dramatically reducing inference cost.

AI researchLinear AttentionMamba
0 likes · 8 min read
Turning Transformers into Mamba: How Apple Linearized Inference Costs
Machine Heart
Machine Heart
Apr 22, 2026 · Artificial Intelligence

Apple Turns Transformers into Mamba with Linear‑Cost Distillation

Apple proposes a two‑step cross‑architecture distillation that converts expensive, high‑performing Transformers into cheaper, nearly equally strong Mamba models by first replacing softmax attention with learned linear attention (Hedgehog) and then embedding this intermediate form into Mamba, achieving comparable perplexity and downstream task performance with far lower inference cost.

Artificial IntelligenceLinear AttentionMamba
0 likes · 7 min read
Apple Turns Transformers into Mamba with Linear‑Cost Distillation