Artificial Intelligence 11 min read

MiniMax M1: Open‑Source LLM That Rivals Gemini 2.5 Pro in Long‑Context Benchmarks

MiniMax’s newly released open‑source M1 model, built on the Lightning Attention‑enhanced MiniMax‑01 base, delivers up to 1 million token context, achieves near‑state‑of‑the‑art performance on MRCR and other long‑context benchmarks, and showcases impressive multilingual translation, code completion, and creative applications.

DataFunTalk

Jun 17, 2025

MiniMax M1: Open‑Source LLM That Rivals Gemini 2.5 Pro in Long‑Context Benchmarks

MiniMax M1 Model Release

After a long silence, MiniMax announced the open‑source release of its first inference model, MiniMax M1, as part of a week‑long “MiniMax Week” on X.

Key Performance Highlights

MiniMax M1’s contextual ability is claimed to be comparable to the top‑tier Gemini 2.5 Pro open‑source model. In benchmark tests:

AIME 2024 logic‑math problems: mixed results, with both weak and strong cases.

LiveCodeBench programming tasks and SWE‑bench Verified code‑completion: average performance.

TAU‑bench (reasoning‑driven scenarios): 62.8% accuracy, approaching open‑source state‑of‑the‑art.

MRCR (Multi‑Round Co‑reference Resolution, also known as “4‑needle”): 62.8% accuracy, effectively matching Gemini 2.5 Pro.

Understanding MRCR

MRCR evaluates a model’s ability to track and resolve references across long, multi‑turn dialogues. The test inserts several similar‑looking requests (e.g., multiple poems about penguins) and then asks the model to retrieve a specific earlier response, demanding precise contextual understanding.

Technical Foundations

The strong contextual performance stems from the Lightning Attention mechanism introduced in the earlier MiniMax‑01 base model. This linear‑attention design makes time and space complexity grow approximately linearly with sequence length, unlike the quadratic growth of traditional Transformers.

Consequently, when generating 64 K tokens, MiniMax M1 consumes less than half the FLOPs of DeepSeek R1; at 100 K tokens, it uses only about 25 % of DeepSeek’s FLOPs.

MiniMax‑01 and M1 share a 456 B parameter Mixture‑of‑Experts architecture with an effective activation of 45.9 B parameters. The maximum context length reaches 1 M tokens—eight times that of DeepSeek‑R1.

Model Variants

Two inference models are released: a 40 K and an 80 K version. The “40 K” and “80 K” refer to the upper limit of the Extended Thinking window, not the overall context length, which remains 1 M tokens.

Practical Experiments

Using MiniMax‑M1, the author performed several real‑world tests:

Full‑document translation, including tables, formulas, and images, with near‑perfect fidelity.

Context‑aware translation that inserted original English terms for a user with a CET‑6 English level.

Extraction of specific chat records from a week‑long group conversation, accurately identifying user IDs and reconstructing dialogue.

Creative tasks such as generating summaries and recommendations for a collection of 34 Liu Cixin short stories, where MiniMax‑M1 succeeded while other models (e.g., DeepSeek) failed dramatically.

Observations and Limitations

While MiniMax‑M1 excels at long‑context reasoning and translation, it occasionally produces inaccurate factual answers (e.g., miscounting herbal ingredients in a classic text). The author also notes that the model’s code‑generation aesthetics still have room for improvement.

Future Outlook

The author anticipates further releases from MiniMax, including potential video models and audio models, and encourages the community to stay tuned for upcoming “big drops.”

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

open-source LLM MiniMax benchmark evaluation context length Lightning Attention

Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.