DeepSeek V4 Pro: 1.6 Trillion Parameters at a Price That Destroys OpenAI's Business Model

Chinese laboratory DeepSeek has released two models with open weights and support for 1 million tokens — and did so at $1.74 per million input tokens, while competitors charge 10 times more.

94
Share:
Ілюстративне фото: Depositphotos

On April 24, 2026, DeepSeek published preview versions of two models from the V4 series — DeepSeek-V4-Pro and DeepSeek-V4-Flash. Both support context windows up to 1 million tokens and are distributed under the MIT license, meaning anyone can download, modify, and deploy them commercially.

What's Inside

V4-Pro contains 1.6 trillion parameters with 49 billion active. Flash has 284 billion total and 13 billion active. Both are built on a Mixture-of-Experts architecture: the model doesn't activate all parameters simultaneously, which significantly reduces inference costs. According to Hugging Face, in a scenario with a 1 million token context, V4-Pro requires only 27% of the computational cost of DeepSeek-V3.2 and 10% of its predecessor's KV-cache size.

Technically, this is achieved through hybrid attention — a combination of Compressed Sparse Attention and Heavily Compressed Attention, which substantially improves efficiency on long contexts. For a developer or analyst, this means the model can process a novel, codebase, or document corpus in a single request.

Benchmarks: Where It Excels, Where It Lags

In practice, V4-Pro outperforms Claude Opus 4.6 on Terminal-Bench 2.0 (67.9% versus 65.4%) — a benchmark for real autonomous command execution with a three-hour timeout — and confidently leads on LiveCodeBench (93.5% versus 88.8%). Meanwhile, according to buildfastwithai.com, Claude maintains an advantage on SWE-bench Verified (80.8% versus 80.6%) and a more significant one on factual accuracy tests HLE and the mathematical HMMT 2026.

"V4 delivers GPT-5-class performance at roughly 1/10 the cost"

NxCode, analytical review of DeepSeek API pricing, April 2026

According to independent researcher Simon Willison, who tested both models via OpenRouter, V4-Pro is the largest open model to date — larger than Kimi K2.6 (1.1T) and GLM-5.1 (754B) and twice larger than the previous DeepSeek V3.2 (685B).

Pricing That Changes the Equation

DeepSeek set the following API rates: Flash — $0.14 per million input tokens and $0.28 for output; Pro — $1.74 and $3.48 respectively. For comparison: V4-Flash costs 12.4 times less than Pro, while on SWE-bench Verified it lags by only 1.6 percentage points (79.0% versus 80.6%).

For a company building an agentic workflow — for example, automated code review or analysis of thousands of documents daily — the difference in token costs is not an academic question but a budget line item. NxCode analysts calculated that DeepSeek's R1 costs 27 times less than an equivalent reasoning model from OpenAI. V4 continues this logic.

The Brookings Institution, back in early 2025 following DeepSeek-R1's release, documented a broader structural effect: DeepSeek proved that breakthrough models can be built not only by Big Tech with unlimited budgets, but by teams that systematically optimize open-source contributions. The company grew out of a hedge fund that used AI for trading decisions — and it was the ability for engineering optimization, rather than raw capital, that became its competitive advantage.

Openness as Strategy

The MIT license means any company can take V4-Pro (865 GB on Hugging Face) or Flash (160 GB) and deploy its own infrastructure stack — with no royalties and no dependence on a third-party API provider. This is a direct challenge to the closed models of OpenAI and Anthropic, whose monetization is built precisely on API access.

At the same time, DeepSeek officially cautions: the models are released in preview status, the chat template in Jinja format is absent, and the old access points deepseek-chat and deepseek-reasoner will be completely disabled on July 24, 2026.

If V4-Pro in its full version confirms the benchmark results of the preview — and if Unsloth or other teams release quantized variants suitable for local deployment on consumer hardware — the question is not whether the enterprise AI market will change, but how much time remains for closed providers before a 10-fold price difference becomes an unacceptable argument for their customers in favor of loyalty.

World News