Claude Computer Use Review: Hands-On Testing (2026)
DeepSeek just changed what âopen-sourceâ means at the frontier level.
V4 launches this month with 1 trillion parameters, native multimodal across text, images, video, and audio, and a 1 million token context window. It runs on hardware you can actually buy. That last part is what makes this different from every other âfrontier-classâ model announcement.
Iâve spent time with the preview builds and the technical reports. Hereâs whatâs real, whatâs marketing, and whether it belongs in your workflow.
Quick Verdict: DeepSeek V4
Aspect Rating Overall Score â â â â ½ (4.5/5) Best For Cost-sensitive teams, local deployment, multimodal workloads API Pricing ~$0.10â$0.30 per million input tokens Context Window 1M tokens Multimodal Text, image, video, audio Open Source Yes (weights available) Local Hardware Dual RTX 4090 or single RTX 5090 Bottom line: DeepSeek V4 is the most capable open-source model ever released, and its pricing makes GPT-5 look like a luxury tax. The MoE architecture means real-world performance is fast. The Engram Memory System is genuinely novel. The main caveat: itâs brand-new, so enterprise trust has to be earned.
Most trillion-parameter announcements are theatre. Nobody is running 1 trillion active parameters per token; the compute cost would be catastrophic.
DeepSeek V4 does what every serious large model does: uses Mixture-of-Experts (MoE) architecture. The model has 1 trillion total parameters across its expert layers, but only 32 billion activate per token. Thatâs the number that actually matters for speed and hardware requirements.
This architecture is why the model can:
The 1 trillion parameter count isnât misleading (larger parameter counts in MoE models genuinely do improve quality), but 32B active parameters is closer to how this model actually feels in use. For context, thatâs similar to the active compute of Llama 3.3 70B, but with a dramatically larger expert pool to draw from.
If you want to understand why this architecture matters for pricing, our guide to AI model architectures breaks it down in plain language.
Context windows are often quoted and rarely used well. GPT-5 caps at 128K. Gemini 3.1 Pro leads the field at 2M.
DeepSeek V4âs 1M token window sits between those two. Unlike some 1M window implementations, DeepSeekâs technical report suggests attention quality holds reasonably well across the full range.
What 1M tokens actually means in practice:
| Content Type | Approximate Token Count |
|---|---|
| Novel (300 pages) | ~120,000 tokens |
| Full codebase (medium project) | ~200,000â400,000 tokens |
| Year of email history | ~500,000 tokens |
| Enterprise document vault | ~1M tokens |
For legal discovery, financial audits, or large codebase analysis, V4âs context window is legitimately useful. The question of retrieval quality at 800K+ tokens is one Iâll update once longer-duration testing is possible. The ceiling is there.
If youâre doing this kind of deep document work, also look at our RAG vs. long context explainer. Sometimes a retrieval pipeline still beats brute-force context stuffing.
DeepSeek V4 was trained on all four modalities from the start, not assembled from separate specialized models.
This matters in a specific way: inter-modal reasoning. When you ask âwhatâs happening in this video and how does it relate to the document I uploaded?â, a natively multimodal model handles that differently than a text model bolted onto a vision API.
What Iâve tested:
Where V4 doesnât obviously lead the field: pure image generation quality is not its selling point. For generated visuals, Midjourney and DALL-E 3 are still better bets. V4 is a comprehension model that happens to understand multiple modalities, not primarily a generation model.
Most âmemoryâ features in AI are glorified key-value stores: you tell it to remember something, it stores a summary, retrieves it later. Useful, but not sophisticated.
DeepSeek V4âs Engram Memory System works differently. It performs selective context retention: the model identifies which parts of prior conversations are likely to remain relevant and retains them at higher fidelity, rather than treating all historical content equally.
In practice, this means:
Iâve seen this type of selective attention described in research papers, but V4 appears to be the first production deployment of something like it at this scale. Whether it works as well in practice as in the technical report is something extended testing will settle. The architecture is worth watching.
This is the headline that matters for anyone running AI at any real volume.
Projected API pricing:
| Model | Input per 1M tokens | Output per 1M tokens |
|---|---|---|
| GPT-5 | $8 | $24 |
| Claude Opus 4.6 | $15 | $75 |
| Gemini 3.1 Pro | $3.50 | $10.50 |
| DeepSeek V4 | ~$0.10â$0.30 | ~$1â$2 (est.) |
At $0.15 per million input tokens, youâre looking at roughly 50x cheaper than GPT-5 for a model thatâs competitive on most benchmarks.
For a team running 50 million tokens per month through a product (not unreasonable for a mid-size SaaS app), thatâs the difference between a ~$400/month API bill and a ~$7,500/month one.
The catch: these are projected prices. Actual production pricing at launch may vary. Watch the DeepSeek pricing page for confirmed rates.
For a full breakdown of how to optimize AI spend across models, our AI cost optimization guide has the framework.
DeepSeekâs claim that V4 runs on consumer hardware is the kind of thing that requires unpacking.
What you actually need:
| Setup | VRAM | Estimated Cost | Notes |
|---|---|---|---|
| Dual RTX 4090 | 48GB combined | ~$3,200 | 4-bit quantized, viable |
| Single RTX 5090 | 32GB | ~$2,000â2,500 | 4-bit quantized, tight |
| 3x RTX 4090 | 72GB combined | ~$4,800 | More headroom |
| A100 (cloud) | 80GB | ~$2â3/hr | Best performance |
âConsumer hardwareâ is doing some work in that sentence. Dual RTX 4090s is not exactly a gaming rig budget. But compared to the multi-GPU server farms required for running GPT-4-class models locally, itâs a meaningful accessibility jump.
For solo developers or small teams wanting to run their own inference, V4 is now plausibly in scope in a way that GPT-5 and Claude Opus 4.6 are not. If youâre curious about the broader ecosystem of models you can run locally, our local LLMs guide covers the landscape.
The comparison everyone wants.
Where V4 clearly wins: Price, local deployment, open-source access, context window vs. GPT-5.
Where V4 is still catching up: Coding benchmarks (Claude still leads), trust and production track record, enterprise compliance tooling.
The honest version: For new AI-native products where cost is a significant constraint, V4 is worth serious consideration. For enterprises with existing Claude or OpenAI deployments and established compliance workflows, switching costs are real.
For more context on how these models stack up, see our AI models comparison guide.
Skepticism is warranted in a few areas:
1. Production track record. V4 just launched. The API reliability, uptime, and support quality of DeepSeekâs infrastructure hasnât been stress-tested at scale. GPT-5 and Claude have years of production hardening. That gap matters for anything customer-facing.
2. Benchmark-to-practice translation. Benchmark scores from the technical report look competitive. Real-world task performance for specialized domains (legal, medical, financial) takes months of community testing to properly calibrate.
3. Data privacy and compliance. For enterprises in regulated industries, using DeepSeekâs hosted API involves the same data sovereignty questions that apply to any third-party model. Hosting V4 locally solves this, but adds infrastructure overhead.
4. Coding at the frontier. Claude Opus 4.6 and GPT-5 remain ahead on complex multi-file code generation. V4 is strong, but if your primary workload is software development, the incumbents still have an edge.
Good fit:
Not the right fit right now:
Option 1: API Access
Option 2: Local Deployment
Option 3: Hosted Platforms Third-party hosts like Together AI, Fireworks, and Replicate typically support major DeepSeek releases within weeks of launch.
DeepSeek V4 changes whatâs possible with open-source AI. The combination of frontier-class capability, 50x price advantage over GPT-5, 1M token context, native multimodal, and local deployability doesnât exist anywhere else.
The question isnât whether itâs impressive. It clearly is. The question is whether itâs ready for your specific use case right now.
My honest take: If youâre building something new and havenât committed to an API provider, start with V4. The cost math alone justifies the evaluation. If you have production workloads running on GPT-5 or Claude, watch the community benchmarks for the next 60 days before switching.
The AI cost landscape just changed. Thatâs worth paying attention to.
The weights are open-source, so you can download and run them at no cost on your own hardware. The hosted API is paid, with projected pricing around $0.10â$0.30 per million input tokens. Thatâs dramatically cheaper than OpenAI or Anthropic, but not free.
With dual RTX 4090s (48GB combined VRAM) using 4-bit quantization, yes. Itâs achievable but requires specific hardware. A single RTX 4090 with 24GB wonât cut it for the full V4 model.
V4 is a different model family focused on multimodal breadth and cost efficiency. R-series models were reasoning-focused. V4 is the more general-purpose deployment model. Both are worth evaluating depending on your use case.
For self-hosted deployments, itâs as safe as any open-weight model you control. For the hosted API, standard third-party data handling questions apply. Enterprise compliance certifications are still early. Watch for SOC 2 and GDPR documentation as the model matures in production.
The 1 trillion total parameters live across the MoE expert layers. Only 32 billion activate per forward pass. Think of it like a team of 1,000 specialists: you only call 32 for any given task, but the breadth of the full team improves quality. Itâs why V4 can run on consumer hardware despite the headline number.
Initial reports suggest quality holds reasonably well through ~800K tokens, with some degradation at the extreme end. This is notably better than GPT-5âs 128K hard limit and worth testing against Gemini 3.1 Proâs 2M for specific document-heavy workloads.
DeepSeekâs proprietary selective context retention system. Rather than storing all conversation history equally, Engram identifies high-value context and preserves it at higher fidelity across sessions. Itâs architecturally more sophisticated than standard memory implementations, though production validation is still early.
Last updated: March 2026. Pricing projections based on DeepSeekâs pre-launch technical documentation. Verify current rates at platform.deepseek.com before making budget decisions.