Inside Grok-5 and MiniMax-M3: Massive Model Upscale and New Sparse Attention Gains

The article reveals that xAI’s upcoming Grok-5 (Grok V9-Medium) will feature a 1.5-trillion-parameter model trained with extensive Cursor programming data, while MiniMax-M3 introduces a new sparse-attention architecture that boosts pre-fill speed by 9.7× and decode speed by 15.6×, highlighting a strategic partnership between SpaceX, Cursor, and xAI.

SuanNi
SuanNi
SuanNi
Inside Grok-5 and MiniMax-M3: Massive Model Upscale and New Sparse Attention Gains

Grok-5

Training of Grok V9‑Medium (Grok‑5) with 1.5 trillion parameters has finished. The model is three times larger than the current production model Grok‑4 (0.5 T) and is scheduled for public release in two to three weeks. A substantial amount of programming data from the AI coding assistant Cursor was added to supplementary training, with additional data planned.

All production traffic for Grok currently runs on the 0.5 T V8‑Small (Grok‑4). Scaling to 1.5 T parameters triples model size, which is expected to raise inference depth and improve handling of complex tasks.

Cursor and SpaceX partnership

Cursor is an AI programming assistant with annual recurring revenue of $2 billion, having doubled in three months—a growth rate notable in SaaS history. It captures real‑developer coding trajectories, including intent, code generation, and error correction, providing richer training material than simple GitHub crawling.

On 21 April, SpaceX announced a partnership with Cursor: Cursor can use SpaceX’s Colossus supercomputer for model training, and SpaceX holds an option to acquire Cursor for $600 billion (or pay $10 billion for cooperation if it does not acquire). The collaboration combines SpaceX’s compute power with Cursor’s data to advance the programming‑focused AI track.

xAI uses Cursor’s coding data to train the Grok base model and a subsequent Composer model, while Cursor leverages Colossus to train its own Composer 2.5.

Musk previously claimed Grok’s coding ability surpasses Cursor; by ingesting Cursor’s data, Grok aims to compete directly with OpenAI’s Codex and Anthropic’s Claude Code.

MiniMax-M3

The MiniMax engineering lead posted a teaser image indicating an upcoming major release. MiniMax‑M3 introduces a new sparse‑attention mechanism. Net‑user analysis reports a 9.7× improvement in pre‑fill speed and a 15.6× boost in decode speed.

The official M2 technical report was released, confirming that the M2 series concludes with MiniMax‑M3 slated to launch.

References:

https://x.com/elonmusk/status/2058787384364265734

https://x.com/SkylerMiao7/status/2059285750458544561

https://x.com/MiniMax_AI/status/2059473229253902516

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

xAICursorAI modelsSparse AttentionSpaceXGrok-5MiniMax-M3
SuanNi
Written by

SuanNi

A community for AI developers that aggregates large-model development services, models, and compute power.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.