Inside Xiaomi’s MiMo‑V2‑Flash: How a Hybrid SWA Design Powers Fast, Efficient AI Reasoning

Xiaomi’s newly open‑sourced MiMo‑V2‑Flash model combines a hybrid sliding‑window/attention architecture with a 309B‑parameter MoE design, delivering top‑tier reasoning, coding and agent performance while introducing the efficient MOPD post‑training paradigm that dramatically reduces RL compute costs.

Hybrid SWALarge Language ModelMOPD

0 likes · 5 min read

Inside Xiaomi’s MiMo‑V2‑Flash: How a Hybrid SWA Design Powers Fast, Efficient AI Reasoning

AI Insight Log

Dec 18, 2025 · Artificial Intelligence

Xiaomi’s New MiMo‑V2‑Flash LLM Rivals DeepSeek‑V3.2 and Near‑GPT‑5 High

Xiaomi’s MiMo‑V2‑Flash, a 309B‑parameter MoE LLM with only 15B active weights, uses Hybrid SWA, Multi‑Token Prediction and Multi‑Teacher On‑Policy Distillation to cut KV‑cache by six times, boost inference speed 2.6×, and achieve performance comparable to DeepSeek‑V3.2, Kimi‑K2 and near‑GPT‑5 High, including a 73.4% SWE‑Bench code‑agent score.

Hybrid SWALarge Language ModelMOPD

0 likes · 7 min read

Xiaomi’s New MiMo‑V2‑Flash LLM Rivals DeepSeek‑V3.2 and Near‑GPT‑5 High

Xiaomi Tech

Dec 17, 2025 · Artificial Intelligence

Xiaomi MiMo-V2-Flash Open‑Source: Ultra‑Efficient Inference and Agent‑Ready Model

Xiaomi's MiMo-V2-Flash, a 309B MoE model with hybrid attention and Multi‑Token Prediction acceleration, delivers top‑2 global agent benchmark scores, up to 2× faster inference, and only 2.5% of the cost of comparable closed‑source models, while being fully open‑source.

Efficient InferenceHybrid AttentionMOPD

0 likes · 7 min read

Xiaomi MiMo-V2-Flash Open‑Source: Ultra‑Efficient Inference and Agent‑Ready Model