Tagged articles
4 articles
Page 1 of 1
Xiaomi Tech
Xiaomi Tech
May 30, 2026 · Artificial Intelligence

How Xiaomi’s MiMo V2.5 Achieves 99% API Price Cut with Full‑Stack Inference Optimizations

The MiMo‑V2.5 series combines Hybrid Sliding‑Window Attention, Mixture‑of‑Experts and multimodal support with a complete redesign of KVCache management, tiered caching, prefix‑tree logic and scheduling, compressing KVCache to about one‑seventh of full‑attention models and delivering up to 40% faster Prefill, 30% lower TTFT and dramatically reduced inference costs that enable a 99% API price reduction.

Hybrid SWAInference OptimizationKVCache
0 likes · 12 min read
How Xiaomi’s MiMo V2.5 Achieves 99% API Price Cut with Full‑Stack Inference Optimizations
Lao Guo's Learning Space
Lao Guo's Learning Space
Apr 30, 2026 · Artificial Intelligence

Xiaomi Opens MiMo‑V2.5 and Gives 100 Trillion Free Tokens – A Must‑Grab

Xiaomi has open‑sourced its MiMo‑V2.5 series, including a 1.02 T‑parameter Pro model, and is giving developers up to 100 trillion free tokens for 30 days; the article details the models' token‑efficiency benchmarks, a macOS‑like demo, MIT‑license benefits, and step‑by‑step usage instructions.

AI benchmarkingLarge Language ModelMIT license
0 likes · 12 min read
Xiaomi Opens MiMo‑V2.5 and Gives 100 Trillion Free Tokens – A Must‑Grab
SuanNi
SuanNi
Apr 26, 2026 · Artificial Intelligence

Xiaomi’s MiMo‑V2.5: Halving Cost, Doubling Efficiency with a New Multimodal LLM

Xiaomi unveiled the MiMo‑V2.5 and MiMo‑V2.5‑Pro large language models, highlighting up to 50% lower API cost, multimodal perception, token‑efficiency gains, benchmark superiority over Claude Opus 4.6 and GPT‑5.4, and real‑world demos that built a full compiler in 4.3 hours and a video‑editing web app in 11.5 hours.

AI agentLarge Language ModelMiMo V2.5
0 likes · 6 min read
Xiaomi’s MiMo‑V2.5: Halving Cost, Doubling Efficiency with a New Multimodal LLM