Xiaomi Unveils MiMo-V2-TTS: Giving Agents a Voice with Soul
Xiaomi introduces MiMo-V2-TTS, a self‑developed speech‑synthesis large model that combines a custom audio tokenizer, multi‑codebook architecture, massive pre‑training on over a hundred million hours of data and multi‑dimensional reinforcement learning to deliver fine‑grained style control, dialect support, role‑play and high‑quality singing, aiming to give AI agents expressive, human‑like voices.
