Claude Design Review: Figma Just Got a Problem
On April 16, Anthropic shipped Claude Opus 4.7 â their new flagship, same price as Opus 4.6, measurably better at code. The benchmark they led with: 87.6% on SWE-bench Verified, a real jump over 4.6. The context window: 1 million tokens, now the default. And the other thing they said, on the same day, in the same news cycle: we have a better model than this one. Weâre not selling it to you. Claude Mythos scores 93.9% on SWE-bench and 97.6% on USAMO â and itâs only available to the Project Glasswing partner cohort because it can find and exploit production zero-days at a level that makes a general release indefensible.
So hereâs the question this review needs to answer. When your vendor publicly says âthis is our second-bestâ on launch day, what exactly are you buying?
Quick Verdict
Aspect Rating Overall Score â â â â â (4.4/5) Best For Production coding workflows, long-context analysis, agent pipelines Pricing $5 / $25 per million input/output tokens (same as Opus 4.6) SWE-bench Verified 87.6% (Mythos: 93.9%) Context Window 1M tokens (default, not a beta flag) Coding Ability â â â â â Value vs. Opus 4.6 â â â â â Bottom line: A genuine, priced-for-free upgrade over Opus 4.6 â if you already run Claude for coding or long-context work, 4.7 is a no-brainer drop-in. The awkward part is what the same launch said about the ceiling youâre not allowed to reach.
Three things matter on the spec sheet, and only three.
87.6% on SWE-bench Verified. SWE-bench Verified is the coding benchmark that isnât easy to game â it measures end-to-end resolution of real GitHub issues in real open-source repos, with human-verified ground truth. Opus 4.6 was already strong here. 4.7 pushes the number meaningfully higher, and thatâs on the same model scaffolding (tool use, file access, agent loop) that production teams already have wired up.
1 million token context window â shipped, not previewed. Opus 4.6 introduced 1M tokens but it showed up as a beta flag in the API and Claude Code needed a toggle to use it. In 4.7, 1M is the default context. Thatâs a quiet but meaningful change. It means your long-context agent runs no longer require special handling, and the fallback behavior when you exceed 200K is no longer a surprise.
Same price as Opus 4.6. $5 per million input tokens, $25 per million output tokens. Anthropic could have charged a premium for 4.7 and most of their customer base would have paid it. They didnât. For enterprise teams planning the next twelve months of AI spend, thatâs the single most useful number on this page.
Thatâs the offer. Better coding, context window no longer gated, price held flat. For a mid-cycle upgrade from a frontier lab, thatâs a clean release.
Hereâs where reviewing a model gets structurally weird.
On the same announcement day, Anthropic said the quiet part out loud: Claude Mythos Preview â the model theyâre running internally â scores 93.9% on SWE-bench Verified and 97.6% on the USAMO (USA Mathematical Olympiad), a math reasoning benchmark where expert humans score in the low-90s. Those numbers arenât comparable increments over 4.7. Theyâre a different tier.
And you canât use it. Access is restricted to the Project Glasswing program, a cohort of security-weighted enterprise partners â AWS, Microsoft, Google, CrowdStrike, Palo Alto Networks, Cisco, and others â at $25/$125 per million tokens. The reason, per Anthropicâs own documentation, is that Mythos has âreached a level of coding capability where it can surpass all but the most skilled humans at finding and exploiting software vulnerabilities.â During preview, the model found a 27-year-old vulnerability in OpenBSD along with critical flaws in every major operating system and browser.
Iâm not arguing thatâs wrong. Our Mythos coverage makes the case that Anthropicâs restraint here is defensible â this is a genuine dual-use capability, and shipping it to anyone with API access would be reckless.
But itâs worth sitting with what the launch architecture implies. Anthropic published âour best model is too dangerous for youâ and âhereâs our new flagshipâ on the same date. The flagship is 4.7. Itâs what youâre allowed to buy. It scores six points below the internal model on the benchmark Anthropic chose to headline. Thatâs the asterisk.
Short answer: yes, and it costs you nothing.
| Metric | Opus 4.6 | Opus 4.7 |
|---|---|---|
| SWE-bench Verified | ~82% | 87.6% |
| Context window (default) | 200K (1M as beta) | 1M |
| Input price (per 1M tokens) | $5 | $5 |
| Output price (per 1M tokens) | $25 | $25 |
| Availability | Claude API, Pro, Amazon Bedrock, Vertex AI, Foundry | Same |
The SWE-bench jump is where the practical difference lives. In agentic coding workflows (the kind Laurenzoâs AMD analysis was measuring during the Claude performance incident earlier this month), a 5-6 point lift on this benchmark correlates with fewer user interrupts, tighter Read:Edit ratios, and fewer retries per completed task. Itâs not a cosmetic number.
Reasoning on non-coding tasks (multi-document analysis, strategic synthesis, writing) is marginally better. Not revolutionary. If youâre using Opus 4.6 for legal review, market analysis, or long-form writing, youâll notice minor sharpness improvements. You wonât notice a different product.
Speed is roughly the same. 4.7 uses the adaptive thinking behavior introduced in 4.6, and response latency on typical prompts is in the same range.
If youâre on Opus 4.6 via the API: the model ID change is the entire migration. No new SDK behaviors, no breaking changes to tool use or caching. Your agent loops keep working.
If youâre on Claude Pro at $20/month: the upgrade happens automatically when Anthropic flips the default, which theyâve said will roll over the next couple of weeks.
If youâre on Claude Max or the new enterprise tier with its per-seat-plus-usage pricing â see our coverage of the pricing shift â your bill changes based on consumption, not model selection, so 4.7 is effectively free to adopt.
Thereâs no reasonable scenario where you stay on 4.6 unless you have a specific quirk of your stack that depends on 4.6âs exact behavior.
Benchmarks can feel abstract. Hereâs what the 4.6 â 4.7 coding improvement translates to in practice, based on what production teams running agent pipelines tend to track.
Fewer retries per completed ticket. SWE-bench scores roughly track the rate at which an AI coding agent closes a real GitHub issue cleanly on first attempt. A 5-6 point lift means a measurable reduction in the âagent went off-rails and I had to interveneâ failure mode.
Longer autonomous runs before a human has to step in. At 82%, an Opus 4.6 agent could handle maybe 3-5 chained tasks before drift accumulated enough that supervision became required. At 87.6%, that ceiling moves up â not to unlimited, but to a range where overnight runs on well-scoped batch jobs start being viable.
The Read:Edit ratio also holds better. Stella Laurenzoâs forensic analysis at AMD showed that when Claude was throttled in March, the ratio collapsed from 6.6 to 2.0, meaning the model was editing code without reading enough context first. 4.7 pushes it back toward the healthy range. If youâve built monitoring on this signal, youâll see it.
Is this worth the same price as 4.6? Yes, because itâs the same price. Would it have been worth a premium? Probably a small one. The fact that Anthropic didnât charge one is either a competitive response to GPT-5.4âs positioning or a deliberate signal to enterprise customers still metabolizing the pricing overhaul from earlier this month. Probably both.
Context window depth is one of those features that sounds marginal until you hit it.
With 1M tokens as the default, a single Claude call can now hold roughly 750,000 words â the length of a medium book, or an entire production codebase of 50-100K lines, or 2,000+ pages of legal discovery material. Opus 4.6 had this capability but behind a beta flag. 4.7 just has it on.
Practical consequences:
The counterweight: retrieval quality at 1M tokens is genuinely good but not uniform. Models still exhibit whatâs been called âlost-in-the-middleâ behavior â information placed near the beginning or end of long contexts is retrieved more reliably than information buried in the middle. Opus 4.7 is better at this than 4.6 but isnât perfect. For critical retrieval tasks, structured context (with explicit section markers) still outperforms dumping a raw 900K-token blob.
For a fuller comparison of how the frontier models handle long context, our AI models comparison for 2026 walks through Gemini 3, GPT-5.2, and Claude at various context depths.
The honest limitations section.
Math reasoning. Opus 4.7 is strong here but not leading. Mythosâs 97.6% USAMO score isnât a number Anthropic published for 4.7 because the comparison would be unflattering. If your workflow is heavy on formal math, Gemini 3 Proâs thinking mode or GPT-5.4 in extended reasoning still have advantages.
Agentic computer use. Still a real capability (see our Claude computer use review) and improved from 4.6, but the reliability gap between âClaude can navigate a GUIâ and âClaude reliably completes a 30-step GUI workflow unattendedâ hasnât closed.
Cost at scale. The $5/$25 pricing is competitive for frontier capability but still a premium compared to Sonnet-tier or GPT-5.2-mini. For workflows that donât need Opus-level reasoning, youâre overpaying. Our AI cost optimization guide has a framework for routing work to the right model tier.
The ceiling problem. This is the one thatâs hardest to quantify. You are, by Anthropicâs own admission, not using their best. Thatâs fine for most work. Itâs uncomfortable to think about for work where that 6-point gap matters â cybersecurity analysis, complex multi-step reasoning, edge cases in scientific computing.
Opus 4.7 is a clean, honest release. Better at coding, 1M context out of the box, same price. If someone asks me whether to upgrade, the answer is yes, and the whole conversation takes about ten seconds.
Whatâs harder is the framing Anthropic chose for launch day. Shipping âhereâs our new flagshipâ and âhereâs the model we wonât sell youâ in the same news cycle is either disciplined transparency or a subtle way to make the thing you can buy feel like a consolation prize. Probably both.
I think the Mythos restriction is the right call â the capability profile genuinely is dual-use in ways that justify gating. But the commercial geometry of âpay $5/$25 for second-bestâ is new territory for a frontier lab, and itâs worth being honest about what that feels like. Youâre buying a very good model from a company that has openly said thereâs a better one.
For almost every practical use case, that doesnât matter. 4.7 is better than what most teams were using yesterday. The price didnât go up. The upgrade is free. The long-context default finally landed. Thatâs a good release.
The reason this review is four stars and not five is the ceiling. Anthropic published that thereâs more on the table â and then priced it out of reach, for structural reasons that are defensible but that still mean what they mean. Youâre paying for second-best because second-best is what youâre allowed to pay for.
The previous release got five stars because there wasnât an explicit better tier sitting next to it on the product page. This one has a label on it: âClaude Opus 4.7â â right next to the fine print saying âMythos Preview â restricted access only.â The label does what labels do.
Claude Opus 4.7 is Anthropicâs new flagship publicly available AI model, announced April 16, 2026. It scores 87.6% on SWE-bench Verified, ships with a 1 million token context window as the default, and is priced at $5 per million input tokens and $25 per million output tokens â the same pricing as the previous Opus 4.6 release.
Opus 4.7 delivers roughly a 5-6 point improvement on SWE-bench Verified over 4.6, makes the 1M context window default instead of beta, and holds pricing flat at $5/$25 per million tokens. For most production coding and long-context workflows, itâs a strict upgrade at no additional cost.
No. Anthropicâs own benchmarks place Mythos Preview at 93.9% SWE-bench Verified and 97.6% USAMO, meaningfully ahead of Opus 4.7. However, Mythos is restricted to Project Glasswing enterprise partners due to its expert-level vulnerability exploitation capability. Opus 4.7 is the highest-capability model Anthropic makes available for general commercial use.
Opus 4.7 is priced at $5 per million input tokens and $25 per million output tokens via Claude API, Amazon Bedrock, Google Cloud Vertex AI, and Microsoft Foundry. For Claude Pro users, 4.7 is included in the $20/month consumer subscription. Enterprise customers pay per-seat plus standard API rates under the new pricing model.
Opus 4.7 supports 1 million tokens (approximately 750,000 words or roughly 2,000 pages of text) as its default context window. This is the same maximum as Opus 4.6 offered, but 4.6 required a beta flag to enable it. In 4.7, 1M is the default behavior for all API calls.
Yes, for nearly every use case. The upgrade is a model ID change with no breaking behavior differences, no price increase, and measurable improvement on coding tasks and long-context work. The only reason to delay is if you have a production stack with narrowly-tuned 4.6 behavior that needs retesting before rollover.
Access to Mythos is restricted to Project Glasswing, a controlled-access program for vetted enterprise partners in cybersecurity, cloud infrastructure, and financial services. Anthropicâs stated reason is that Mythos can find and exploit software vulnerabilities at a level that creates real-world risk if released broadly. Our full Mythos analysis covers the capability profile and the regulatory response.
Last updated: April 18, 2026. Sources: Anthropic: Claude Opus 4.7 announcement ¡ Anthropic: Project Glasswing / Mythos documentation ¡ GitHub: Claude Code issue #42796 ¡ The Register: Claude outage and quality analysis.
Related reading: Claude Opus 4.6 Review ¡ Claude Mythos: Too Dangerous to Release ¡ Claudeâs Hidden Performance Cut ¡ Best AI Coding Assistants 2026 ¡ Anthropic vs OpenAI 2026