MiniMax M1: Open‑Source LLM That Rivals Gemini 2.5 Pro in Long‑Context Benchmarks
MiniMax’s newly released open‑source M1 model, built on the Lightning Attention‑enhanced MiniMax‑01 base, delivers up to 1 million token context, achieves near‑state‑of‑the‑art performance on MRCR and other long‑context benchmarks, and showcases impressive multilingual translation, code completion, and creative applications.
MiniMax M1 Model Release
After a long silence, MiniMax announced the open‑source release of its first inference model, MiniMax M1, as part of a week‑long “MiniMax Week” on X.
Key Performance Highlights
MiniMax M1’s contextual ability is claimed to be comparable to the top‑tier Gemini 2.5 Pro open‑source model. In benchmark tests:
AIME 2024 logic‑math problems: mixed results, with both weak and strong cases.
LiveCodeBench programming tasks and SWE‑bench Verified code‑completion: average performance.
TAU‑bench (reasoning‑driven scenarios): 62.8% accuracy, approaching open‑source state‑of‑the‑art.
MRCR (Multi‑Round Co‑reference Resolution, also known as “4‑needle”): 62.8% accuracy, effectively matching Gemini 2.5 Pro.
Understanding MRCR
MRCR evaluates a model’s ability to track and resolve references across long, multi‑turn dialogues. The test inserts several similar‑looking requests (e.g., multiple poems about penguins) and then asks the model to retrieve a specific earlier response, demanding precise contextual understanding.
Technical Foundations
The strong contextual performance stems from the Lightning Attention mechanism introduced in the earlier MiniMax‑01 base model. This linear‑attention design makes time and space complexity grow approximately linearly with sequence length, unlike the quadratic growth of traditional Transformers.
Consequently, when generating 64 K tokens, MiniMax M1 consumes less than half the FLOPs of DeepSeek R1; at 100 K tokens, it uses only about 25 % of DeepSeek’s FLOPs.
MiniMax‑01 and M1 share a 456 B parameter Mixture‑of‑Experts architecture with an effective activation of 45.9 B parameters. The maximum context length reaches 1 M tokens—eight times that of DeepSeek‑R1.
Model Variants
Two inference models are released: a 40 K and an 80 K version. The “40 K” and “80 K” refer to the upper limit of the Extended Thinking window, not the overall context length, which remains 1 M tokens.
Practical Experiments
Using MiniMax‑M1, the author performed several real‑world tests:
Full‑document translation, including tables, formulas, and images, with near‑perfect fidelity.
Context‑aware translation that inserted original English terms for a user with a CET‑6 English level.
Extraction of specific chat records from a week‑long group conversation, accurately identifying user IDs and reconstructing dialogue.
Creative tasks such as generating summaries and recommendations for a collection of 34 Liu Cixin short stories, where MiniMax‑M1 succeeded while other models (e.g., DeepSeek) failed dramatically.
Observations and Limitations
While MiniMax‑M1 excels at long‑context reasoning and translation, it occasionally produces inaccurate factual answers (e.g., miscounting herbal ingredients in a classic text). The author also notes that the model’s code‑generation aesthetics still have room for improvement.
Future Outlook
The author anticipates further releases from MiniMax, including potential video models and audio models, and encourages the community to stay tuned for upcoming “big drops.”
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.