Industry Insights 5 min read

DeepSeek Slashes V4 Pro to 25% of Original Price Forever—Is Token Cost Anxiety Finally Relieved?

DeepSeek announced a permanent 75% discount for V4 Pro, reducing cache‑hit token costs to $0.003625 per million, prompting developers to share lower bills, swap Claude Code back‑ends via a single environment variable, and spark industry debate over pricing, privacy, and AI stack design.

AI Engineering

May 23, 2026

DeepSeek Slashes V4 Pro to 25% of Original Price Forever—Is Token Cost Anxiety Finally Relieved?

DeepSeek’s official channels announced that the V4 Pro model’s limited‑time 75% discount is now permanent, cutting the price to roughly one‑quarter of its original rate.

The new pricing breaks down as follows: cache‑hit input costs $0.003625 per million tokens (down from $0.0145), cache‑miss input $0.435 per million (down from $1.74), and output $0.87 per million (down from $3.48). Compared with domestic competitors, the model’s cost is now effectively 25% of the previous price, with no expiration.

The key innovation is the near‑free cache‑hit rate, meaning repeated content—such as uploaded project documents or continuous conversation history—incurs negligible token fees (only $0.003625 per million tokens, i.e., $0.036 for ten million repeated tokens).

Developers have posted real‑world usage data: one user consumed 1.5 billion AI tokens over 20 days at a cost far below that of competing models. Another developer leveraged DeepSeek’s Anthropic‑compatible API endpoint, changing only one environment variable ( ANTHROPIC_BASE_URL=https://api.deepseek.com/anthropic) to replace Claude Code’s backend with DeepSeek V4 Pro, preserving the CLI experience while dramatically lowering expenses.

Further experimentation includes task‑level worker splitting—using DeepSeek V4 Pro for complex implementations and DeepSeek Light for quick fixes—and the creation of a completely free coding‑assistant tool that supports multiple models (DeepSeek V4 Pro/Flash, Kimi K2.6, MiniMax M2.7) by covering costs with minimal text‑only ads ( npm i -g freebuff).

Community observations highlight several trends: (1) Companies facing rapid budget drain from AI agents find the cheap cache ideal, suggesting a future “routing” architecture where inexpensive models handle repetitive loops while cutting‑edge models tackle complex reasoning; (2) DeepSeek’s pricing reshapes developer expectations, lowering the barrier for individual developers; (3) Near‑free caching removes the need to aggressively prune context, enabling full‑document and full‑history inputs for tasks like automated code iteration and long‑term project tracking.

Nevertheless, concerns remain about data privacy, retention policies, and tooling maturity—some developers note the need to manually install VS Code extensions to integrate with GitHub Copilot chat.

Overall, the AI market appears to be entering a price‑war phase: what once required large‑scale corporate budgets is now affordable for individuals, shifting competition from pure model performance to ecosystem completeness, privacy transparency, and user trust.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

cost optimization DeepSeek industry analysis AI pricing Anthropic API V4 Pro token cache

Written by

AI Engineering

Focused on cutting‑edge product and technology information and practical experience sharing in the AI field (large models, MLOps/LLMOps, AI application development, AI infrastructure).

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.