Xiaomi Slashes Token Prices by Up to 99% to Match DeepSeek’s API Pricing

The article analyzes the recent AI API price war, detailing DeepSeek’s step‑by‑step token‑price reductions, Xiaomi’s 99% cut that aligns its MiMo‑V2.5 Pro tier with DeepSeek, the underlying technical optimizations that enable lower costs, and the broader market shift toward cost‑driven competition.

SuanNi
SuanNi
SuanNi
Xiaomi Slashes Token Prices by Up to 99% to Match DeepSeek’s API Pricing

DeepSeek’s Rapid Price Reductions

On May 22, DeepSeek announced a permanent price cut for its V4‑Pro API, matching the highest discount of 99% later offered by Xiaomi. The original V4‑Pro pricing (April 25) was 0.1 yuan/​million tokens for cache‑hit input, 12 yuan for cache‑miss input, and 24 yuan for output. One day later, DeepSeek reduced the cache‑hit input price to one‑tenth of the launch price (0.025 yuan/​million tokens) and added a 2.5× limited‑time discount.

The limited‑time offer, initially set to end on May 5, was extended to May 31, and on May 22 DeepSeek made the 2.5× discount permanent. After the promotional period, the V4‑Pro price settled at one‑quarter of the original rate.

Five Days Later Xiaomi Aligns

At midnight on May 27, Xiaomi’s MiMo announced a permanent price cut for its MiMo‑V2.5 series, with the maximum discount reaching 99% and eliminating tiered pricing based on context‑window length.

CEO Lei Jun highlighted the 99% reduction on Weibo, noting that the post‑discount figures now exactly match DeepSeek’s long‑term V4‑Pro prices. Previously, MiMo‑V2 priced 256K‑token and 256K‑to‑1M‑token windows differently, making long‑context usage more expensive; the new V2.5 series removes this distinction, lowering the barrier for long‑context tasks.

The Token Plan was also adjusted, increasing usage per unit price by 5–8× and resetting all active user quotas. The 100 TB Token creator incentive program was completed early on May 26, and the permanent price cut together with quota resets continues to support developers after the free‑token bonus ends.

Technical Foundations of the Cut

Xiaomi attributes its ability to sustain low prices to engineering optimizations. Using SGLang HiCache with full support for SWA, the system reduces KV‑Cache data movement across GPU memory, CPU memory, and SSD to roughly one‑seventh of the original volume, boosting cacheable token capacity by nearly five times. Expert parallelism and input‑length bucketing further increase cluster throughput.

These backend improvements are presented as the decisive factor that enables long‑term low‑price offerings.

Market Landscape and Competitive Dynamics

Beyond DeepSeek and Xiaomi, other players are moving in opposite directions: ByteDance’s Doubao app launched three subscription tiers (68 CNY, 200 CNY, 500 CNY per month); Zhipu has raised API prices three times this year, most recently adding 10% on April 8; Alibaba Cloud and Tencent Cloud increased AI‑related pricing in mid‑May.

Consequently, the industry has diverged into three trends within a week: subscription‑based charging (Doubao), price hikes by cloud providers, and permanent price cuts by DeepSeek and Xiaomi.

Implications

The price war is no longer solely about model size or benchmark performance; it now extends to inference frameworks, caching systems, and cluster resource scheduling. The ability to keep per‑token service costs low under high concurrency, long context, and multi‑turn workloads is becoming a core infrastructure capability.

Both DeepSeek and Xiaomi signal a shift from a capability‑premium phase to a cost‑constraint phase for domestic large‑model APIs, a pressure that is likely to spread to other model vendors.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

large language modelsDeepSeektoken costXiaomiAI pricingAPI competition
SuanNi
Written by

SuanNi

A community for AI developers that aggregates large-model development services, models, and compute power.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.