Hero image for DeepSeek's 75% Price Cut: What It Costs You Now
By AI Tool Briefing Team

DeepSeek's 75% Price Cut: What It Costs You Now


DeepSeek did the thing that vendors do not normally do. On May 22, 2026, the team announced that the 75% off promo on V4-Pro API pricing — originally set to roll back on May 31, 2026 at 15:59 UTC — would not roll back at all. The discounted rate is now the permanent list price.

Read that one more time. The promotional rate is the list rate. The list rate that was on the pricing page two weeks ago has been deleted. V4-Pro now lists at $0.435 per million input tokens (cache-miss), $0.003625 per million on cache hits, and $0.87 per million output tokens.

That is not a pricing decision. That is a structural reset on what frontier-class inference is supposed to cost.

The reason matters more than the number. DeepSeek did not cut prices because Alibaba’s Qwen team forced them to. They cut prices because V4-Pro genuinely runs at a small fraction of the compute its predecessor required, and the company decided to pass that efficiency through to the API line item rather than bank it as margin. That decision puts a roughly 6-7x cost gap between DeepSeek and US frontier labs at production scale, and it forces every enterprise procurement team to redo math they thought was settled.

Quick Summary: DeepSeek V4-Pro’s Permanent Price Cut

DetailInfo
Date announcedMay 22, 2026
Primary sourceDeepSeek API Pricing Docs
What changed75% promo discount made permanent — list price now 1/4 of original
New V4-Pro input (cache-miss)$0.435 / MTok
New V4-Pro input (cache-hit)$0.003625 / MTok
New V4-Pro output$0.87 / MTok
Cache-hit pricing1/10 of launch price (effective April 26, 2026)
Cost vs Claude Opus 4.7~1/6 on blended workload (reporting)
Cost vs GPT-5.5~1/7 on blended workload
Compute efficiency vs V3.227% of single-token inference FLOPs
KV cache memory vs V3.210% of prior burden

Bottom line: A permanent price floor at this level is not a promo — it’s a public commitment to a cost basis the US labs cannot match without changing their architecture. Every API contract priced before May 22 is now on the wrong side of a 6-7x gap.

What Actually Happened on May 22

Per TheNextWeb’s coverage, DeepSeek announced that the V4-Pro discount — running since the model’s general availability in late April 2026 (April 24, 2026) — would convert from a time-limited promotion to the official permanent price. The exact mechanism is administrative. The 75% off the original list price stays on. The “ends May 31, 2026” timer was removed. The pricing page now shows the discounted figures as the only rate.

InfoWorld’s analysis framed it as an escalation in the AI pricing war. That’s half right. The other half — and the one that matters more for builders — is that DeepSeek is signaling something about its underlying cost basis. You do not make a price cut permanent unless your gross margin still works at the new price. The implication is that V4-Pro inference, at scale, costs DeepSeek meaningfully less than V3-class inference cost the company in 2025.

The architecture data backs the implication. Per reporting on the V4 model card, V4-Pro hits roughly 27% of the single-token inference FLOPs that V3.2 required at 1M-token context, with the KV cache memory burden dropping to about 10% of prior levels. The Compressed Sparse Attention and Heavily Compressed Attention layers underneath the MoE routing are doing real work. The result is that per-token cost on DeepSeek’s own infrastructure dropped on a step function. The pricing reset is the customer-facing consequence.

That’s the part the headlines miss. The other Chinese labs — Alibaba, Moonshot, Zhipu — can match the price on a competitive basis, but they cannot match the cost basis without a comparable architectural shift. The US frontier labs are even further from that math. Their models are dense (mostly), their inference stacks are tuned for higher per-token economics, and their pricing power is anchored to enterprise contracts that don’t reprice every quarter.

The Real Cost Gap at Production Scale

Here’s what the new pricing looks like against the alternatives on a 100M-output-token monthly workload with a typical 4:1 input-to-output ratio. Numbers reflect public list rates as of late May 2026.

ModelInput ($/MTok)Output ($/MTok)400M input + 100M output cost
DeepSeek V4-Pro$0.435$0.87~$261
Claude Opus 4.7$5.00$25.00~$4,500
GPT-5.5$5.00$30.00~$5,000

The blended ratio at this volume lands at roughly 17x cheaper than Opus 4.7 and 19x cheaper than GPT-5.5 on cache-miss pricing. The reason the more commonly cited figure is “6-7x” — per VentureBeat and MindStudio’s comparison — is that real production traffic on the US labs has caching, prompt optimization, and tier discounts already applied. The 17x is the headline number. The 6-7x is what you’ll actually see in a procurement model that compares apples to apples.

Either way, the gap is structural, and it’s now anchored.

The cache-hit story is the part that should worry US lab pricing teams more than the headline cut. Per DeepSeek’s pricing details, cache-hit input on V4-Pro is now $0.003625 per million tokens — one-tenth of the launch rate, effective April 26, 2026. Production agentic workloads are dominated by repeated requests against the same system prompt and tool definitions. That pattern lights up cache hits constantly. At three-tenths of a cent per million cached tokens, the marginal cost of agent loops on DeepSeek rounds to zero for the input side. Anthropic and OpenAI both have prompt-caching mechanisms, but the discount factors are smaller and the rates after caching still sit well above DeepSeek’s cache-miss price. The cache-hit pricing is the part of this announcement that quietly tilts the agentic-app math the most.

Why Enterprise Procurement Teams Should Reopen Every Contract

Three reasons the May 22 announcement is a reopen-the-contract event, not a watch-it event.

1. The cost basis just got publicized

A permanent price floor is a published cost reference. Every Anthropic, OpenAI, and Google account team has a CFO who reads InfoWorld. The next time an enterprise procurement team comes to renewal with a competing-vendor quote in hand, the floor on the other side of the table is now visible to everyone. That changes the negotiation. Vendors who were previously holding firm at $5/$25 or $5/$30 pricing now have to defend a gap that didn’t exist on the public pricing page two weeks ago.

You don’t have to actually move workloads to DeepSeek to get the leverage. You just have to credibly threaten to model the savings. The lower the threshold for “credible threat,” the better your renewal terms.

2. The compliance shape just got more complicated, not simpler

DeepSeek’s price advantage runs straight into the same Chinese vendor-risk constraint we wrote about in the talent travel restriction analysis last week. Federal contractor workloads, regulated-industry workloads, and PII-touching workloads on Chinese model APIs are increasingly hard to justify. The price cut does not change that.

What it does change is the cost of saying no. A general counsel who blocks DeepSeek for compliance reasons in May 2026 is now blocking a vendor that’s 6-7x cheaper than the approved alternatives. That’s a defensible call for regulated workloads. It’s a harder call for non-regulated internal tooling, developer experimentation, batch workloads, and offline use cases where the data residency story doesn’t matter. Expect the compliance line to move — not toward “all Chinese AI is approved,” but toward “approved for these specific use cases, with these specific guardrails.” The procurement-policy memo for Q3 2026 needs to be written, not delayed.

3. Cache pricing rewrites agent economics

The agentic stack — autonomous coding agents, customer support agents, research agents, browser-control agents — is the workload where DeepSeek’s pricing is most disruptive. Per the enterprise AI deployment guide, agent loops typically use 50-100 tool calls per task, with each call re-sending a substantial system prompt and tool schema. On cache-hit pricing, that’s effectively free input. On output, the gap to the US labs is still 30-35x.

For teams running Claude Code or OpenAI’s coding agents at scale, the relevant question isn’t “can DeepSeek replace Claude on our hardest tasks.” The answer to that is probably no — Opus 4.7 still wins on complex multi-step reasoning, and the V4 review we published in March walks through where the gap shows up. The relevant question is “what percentage of our agent calls are routine enough to route to DeepSeek instead, and what does that save us.” For most enterprise agentic deployments, that percentage is 40-70%, and the math at this pricing is meaningful enough that the engineering work to set up model routing pays back inside a month.

What Smart API Buyers Should Do This Week

Five concrete moves, ordered roughly by payoff per hour of work.

  1. Re-audit your token mix. Pull last quarter’s usage logs by use case. Tag each workload by sensitivity (regulated/not), complexity (frontier reasoning required/not), and latency requirements. Anything that lands in the “non-regulated, routine, latency-flexible” bucket is a candidate for DeepSeek routing. The AI cost optimization guide has the categorization framework.
  2. Run a side-by-side cost model for top three use cases. Per DevTk.AI’s cost calculator, the breakeven on Claude Opus 4.7 vs DeepSeek V4-Pro flips somewhere around 50M monthly tokens for most workloads. Below that, the migration cost outweighs the savings. Above that, the math gets harder to argue with each month.
  3. Open the renewal conversation early. If your Anthropic or OpenAI contract is up in Q3 or Q4, start the negotiation now. The public pricing data point is fresh. The account team’s flexibility is highest before the quote becomes uncomfortable for both sides. Multi-year commitments at modestly improved rates are easier to secure in May than in August.
  4. Stand up a model router for non-critical traffic. A simple OpenRouter or self-built routing layer that sends 30-50% of routine API traffic to DeepSeek (with fallback to Anthropic or OpenAI on failure) is two to three engineering days. The savings at 100M+ monthly tokens cover the work in a single billing cycle. The pricing comparison piece covers the routing patterns that work.
  5. Document the compliance line explicitly. The fastest way to lose the savings is to have a fuzzy policy that lets every team make its own call about whether DeepSeek is allowed for their workload. Write the policy, get it signed off, communicate it. The enterprise AI deployment checklist has the policy template that fits this case.

How Anthropic and OpenAI Will Respond

The price cut puts both US frontier labs in an awkward spot. Neither can match DeepSeek’s published rates without re-architecting inference economics that they’ve spent two years building margin into. Both have other levers.

Anthropic’s most likely response is to accelerate the cache discount story rather than touch the headline rate. Claude already has prompt caching, and the discount factor has room to grow. Per our earlier pricing comparison, Anthropic has held per-token pricing steady for two quarters and is unlikely to move it before the October IPO. What Anthropic can do is widen the cache discount, expand the batch API rate, and lean harder on enterprise developer-seat pricing for Claude Code, where the value capture is per-engineer rather than per-token.

OpenAI has more room to move on price because the GPT-5.5 cost basis is improving with each Blackwell-generation deployment. Expect a targeted cut on input pricing for high-volume enterprise tiers, possibly bundled with extended cache discounts, sometime in the next two quarters. The political read is that OpenAI cannot let the cost gap become the lead in every procurement conversation through 2027 without ceding the developer market it spent 2024 and 2025 building.

Google is the interesting third wheel. Gemini 3.x pricing has been the most aggressive of the US labs all year, and Google has the most room to push lower because Gemini inference runs on TPUs at a different cost basis than the GPU stacks at Anthropic and OpenAI. A targeted Gemini Pro price cut to undercut DeepSeek on regulated workloads — where DeepSeek can’t compete — would be a smart play. Watch for it before the next Google Cloud Next.

The Bigger Picture: This Is the Pricing War’s New Floor

For two years the AI pricing conversation has been about whether the US labs would defend per-token economics or be forced to compete on price. May 22 is the first day the question stopped being theoretical. DeepSeek published a floor. The other Chinese labs will match it within the quarter. The US labs will respond with something more creative than a matching cut, because matching is structurally hard for them.

The downstream effect is that the value proposition of US frontier models — Opus 4.7, GPT-5.5, Gemini 3 Pro — has to be defended on grounds other than price. That’s a defensible position. Frontier reasoning, enterprise integrations, compliance posture, audit logs, support quality, and platform stability are all legitimate reasons to pay 6-7x more for a token. None of those reasons hold for every workload. The procurement question for 2026 and 2027 becomes which subset of your token volume genuinely requires the frontier and which subset is just paying for it out of inertia.

The other piece of the bigger picture is the cost-basis arms race. DeepSeek’s price cut is a public claim that its architecture is cheaper to run. That claim will pressure every US lab to publish or reveal its own efficiency story over the next 12 months. Expect new architecture papers, expect more emphasis on inference benchmarks, and expect the conversation about per-token compute cost to move from a back-channel topic to a front-page one.

Our Take

The mistake to avoid is reading this as a Chinese-versus-US story. It’s a price-versus-value story with a Chinese vendor as the price disruptor and US vendors as the value defenders. That framing keeps procurement decisions in the right place — workload by workload, against the right cost basis on each side — instead of getting derailed into geopolitical theater.

The second mistake is treating the announcement as a deal you have to grab this week. The 75% cut is permanent. It’s not going away if you wait two weeks. What’s worth doing this week is the audit work that lets your team make a clean decision once the broader response from the US labs starts to land. By August, you’ll have new Anthropic enterprise pricing on the table, a likely OpenAI input-price adjustment, and a possible Gemini enterprise push. The right move now is to build the routing infrastructure and the policy framework that lets you take advantage of whichever combination of price cuts and credits lands in your favor.

The third mistake — the one that costs the most over a 12-month horizon — is assuming the US labs will match. They won’t. The cost basis is different, the strategic posture is different, and the customer base they’re optimizing for can absorb the premium. The pricing gap is real and it will stay real. The teams that build a multi-vendor stack now will keep more margin in 2027 than the teams that wait for a single vendor to solve the problem for them.

Frequently Asked Questions

What exactly changed on May 22, 2026?

DeepSeek announced that the 75% off promo on V4-Pro API pricing — originally set to expire May 31, 2026 at 15:59 UTC — is now the permanent list price. Per the official pricing docs, V4-Pro now lists at $0.435 per million input tokens (cache-miss), $0.003625 per million cache-hit, and $0.87 per million output tokens.

How does that compare to Claude Opus 4.7 and GPT-5.5?

At standard list rates and a typical 4:1 input-to-output ratio on a 100M output token workload, DeepSeek V4-Pro lands at roughly $261 per month versus ~$4,500 for Claude Opus 4.7 and ~$5,000 for GPT-5.5. The blended figure most procurement teams cite — after accounting for cache discounts and enterprise tier rates on the US side — is closer to a 6-7x gap.

Why did DeepSeek make the cut permanent?

The structural reason: V4-Pro runs at roughly 27% of the single-token inference FLOPs of its predecessor (V3.2) and uses about 10% of the KV cache memory, per the architecture reporting. The company can hold gross margin at the new price because its cost basis dropped on a step function with the V4 architecture. The strategic reason: a permanent floor commits the price publicly and forces competitors to defend a visible gap.

Should I move my production workload off Anthropic or OpenAI?

For regulated workloads, federal contractor work, or anything touching customer PII, the answer is no — the Chinese AI vendor risk story sharpened further this month and likely outweighs the savings. For non-regulated, routine API traffic where complex frontier reasoning isn’t required, DeepSeek is now hard to ignore. Most enterprises will end up with a routing layer that sends 30-50% of traffic to DeepSeek and keeps the rest on US vendors.

Will Anthropic and OpenAI match the price?

Probably not on the headline rate. Both companies are more likely to respond with expanded cache discounts, batch API pricing, and enterprise-tier concessions rather than a matching cut. Google may move more aggressively on Gemini pricing because its TPU-based inference stack has more cost-basis flexibility than the GPU stacks at Anthropic and OpenAI. Expect targeted responses inside the next two quarters.

What’s the cache-hit pricing story about?

Per the DeepSeek pricing details page, the cache-hit input rate was reduced to 1/10 of the launch price effective April 26, 2026. On V4-Pro, that’s $0.003625 per million tokens. The pricing targets agentic workloads where the same system prompt and tool definitions get sent repeatedly. At this rate, the input cost of agent loops effectively rounds to zero — a far bigger structural shift than the headline 75% cut on cache-miss input.

Does the price cut change DeepSeek’s underlying capability?

No. The model is the same V4-Pro that launched on April 24, 2026. The original review of V4 and the DeepSeek vs ChatGPT comparison still capture where the model is strong (coding, structured tasks, long-context retrieval, cost-sensitive deployments) and where it isn’t (frontier multi-step reasoning, hard scientific problems, novel synthesis tasks). The pricing decision changes the procurement math, not the benchmark numbers.

What should I watch over the next 90 days?

Three things. First, whether Alibaba’s Qwen team matches or undercuts DeepSeek’s new pricing — that confirms the floor across Chinese frontier labs. Second, whether Anthropic or OpenAI announces expanded cache discounts or batch API price cuts. Third, whether the US compliance posture on Chinese AI tightens in a way that narrows the enterprise market for DeepSeek. The interaction of those three will determine whether the 6-7x gap holds, narrows, or widens by the end of Q3.


Last updated: May 30, 2026. Sources: DeepSeek API Pricing · DeepSeek Pricing Details (USD) · TheNextWeb on the permanence announcement · InfoWorld on the pricing war · VentureBeat on V4 cost comparison · MindStudio cost analysis · DevTk.AI pricing guide · V4 architecture reporting.

Related reading: DeepSeek V4 Review · DeepSeek vs ChatGPT · AI Pricing Comparison 2026 · AI Cost Optimization Guide · Enterprise AI Deployment · Claude Opus 4.7 Review · China Locks Down AI Talent · Anthropic vs OpenAI in 2026