Hero image for Claude Opus 4.7 Review: Anthropic's Second-Best
By AI Tool Briefing Team

Claude Opus 4.7 Review: Anthropic's Second-Best


On April 16, Anthropic shipped Claude Opus 4.7 — their new flagship, same price as Opus 4.6, measurably better at code. The benchmark they led with: 87.6% on SWE-bench Verified, a real jump over 4.6. The context window: 1 million tokens, now the default. And the other thing they said, on the same day, in the same news cycle: we have a better model than this one. We’re not selling it to you. Claude Mythos scores 93.9% on SWE-bench and 97.6% on USAMO — and it’s only available to the Project Glasswing partner cohort because it can find and exploit production zero-days at a level that makes a general release indefensible.

So here’s the question this review needs to answer. When your vendor publicly says “this is our second-best” on launch day, what exactly are you buying?

Quick Verdict

AspectRating
Overall Score★★★★☆ (4.4/5)
Best ForProduction coding workflows, long-context analysis, agent pipelines
Pricing$5 / $25 per million input/output tokens (same as Opus 4.6)
SWE-bench Verified87.6% (Mythos: 93.9%)
Context Window1M tokens (default, not a beta flag)
Coding Ability★★★★★
Value vs. Opus 4.6★★★★☆

Bottom line: A genuine, priced-for-free upgrade over Opus 4.6 — if you already run Claude for coding or long-context work, 4.7 is a no-brainer drop-in. The awkward part is what the same launch said about the ceiling you’re not allowed to reach.


What Opus 4.7 Actually Delivers

Three things matter on the spec sheet, and only three.

87.6% on SWE-bench Verified. SWE-bench Verified is the coding benchmark that isn’t easy to game — it measures end-to-end resolution of real GitHub issues in real open-source repos, with human-verified ground truth. Opus 4.6 was already strong here. 4.7 pushes the number meaningfully higher, and that’s on the same model scaffolding (tool use, file access, agent loop) that production teams already have wired up.

1 million token context window — shipped, not previewed. Opus 4.6 introduced 1M tokens but it showed up as a beta flag in the API and Claude Code needed a toggle to use it. In 4.7, 1M is the default context. That’s a quiet but meaningful change. It means your long-context agent runs no longer require special handling, and the fallback behavior when you exceed 200K is no longer a surprise.

Same price as Opus 4.6. $5 per million input tokens, $25 per million output tokens. Anthropic could have charged a premium for 4.7 and most of their customer base would have paid it. They didn’t. For enterprise teams planning the next twelve months of AI spend, that’s the single most useful number on this page.

That’s the offer. Better coding, context window no longer gated, price held flat. For a mid-cycle upgrade from a frontier lab, that’s a clean release.


The Mythos Asterisk — What You’re Not Getting

Here’s where reviewing a model gets structurally weird.

On the same announcement day, Anthropic said the quiet part out loud: Claude Mythos Preview — the model they’re running internally — scores 93.9% on SWE-bench Verified and 97.6% on the USAMO (USA Mathematical Olympiad), a math reasoning benchmark where expert humans score in the low-90s. Those numbers aren’t comparable increments over 4.7. They’re a different tier.

And you can’t use it. Access is restricted to the Project Glasswing program, a cohort of security-weighted enterprise partners — AWS, Microsoft, Google, CrowdStrike, Palo Alto Networks, Cisco, and others — at $25/$125 per million tokens. The reason, per Anthropic’s own documentation, is that Mythos has “reached a level of coding capability where it can surpass all but the most skilled humans at finding and exploiting software vulnerabilities.” During preview, the model found a 27-year-old vulnerability in OpenBSD along with critical flaws in every major operating system and browser.

I’m not arguing that’s wrong. Our Mythos coverage makes the case that Anthropic’s restraint here is defensible — this is a genuine dual-use capability, and shipping it to anyone with API access would be reckless.

But it’s worth sitting with what the launch architecture implies. Anthropic published “our best model is too dangerous for you” and “here’s our new flagship” on the same date. The flagship is 4.7. It’s what you’re allowed to buy. It scores six points below the internal model on the benchmark Anthropic chose to headline. That’s the asterisk.


Opus 4.7 vs Opus 4.6: Should You Upgrade?

Short answer: yes, and it costs you nothing.

Where 4.7 Pulls Ahead

MetricOpus 4.6Opus 4.7
SWE-bench Verified~82%87.6%
Context window (default)200K (1M as beta)1M
Input price (per 1M tokens)$5$5
Output price (per 1M tokens)$25$25
AvailabilityClaude API, Pro, Amazon Bedrock, Vertex AI, FoundrySame

The SWE-bench jump is where the practical difference lives. In agentic coding workflows (the kind Laurenzo’s AMD analysis was measuring during the Claude performance incident earlier this month), a 5-6 point lift on this benchmark correlates with fewer user interrupts, tighter Read:Edit ratios, and fewer retries per completed task. It’s not a cosmetic number.

Where It’s Not Dramatically Different

Reasoning on non-coding tasks (multi-document analysis, strategic synthesis, writing) is marginally better. Not revolutionary. If you’re using Opus 4.6 for legal review, market analysis, or long-form writing, you’ll notice minor sharpness improvements. You won’t notice a different product.

Speed is roughly the same. 4.7 uses the adaptive thinking behavior introduced in 4.6, and response latency on typical prompts is in the same range.

Upgrade Decision

If you’re on Opus 4.6 via the API: the model ID change is the entire migration. No new SDK behaviors, no breaking changes to tool use or caching. Your agent loops keep working.

If you’re on Claude Pro at $20/month: the upgrade happens automatically when Anthropic flips the default, which they’ve said will roll over the next couple of weeks.

If you’re on Claude Max or the new enterprise tier with its per-seat-plus-usage pricing — see our coverage of the pricing shift — your bill changes based on consumption, not model selection, so 4.7 is effectively free to adopt.

There’s no reasonable scenario where you stay on 4.6 unless you have a specific quirk of your stack that depends on 4.6’s exact behavior.


What 87.6% on SWE-bench Actually Buys You

Benchmarks can feel abstract. Here’s what the 4.6 → 4.7 coding improvement translates to in practice, based on what production teams running agent pipelines tend to track.

Fewer retries per completed ticket. SWE-bench scores roughly track the rate at which an AI coding agent closes a real GitHub issue cleanly on first attempt. A 5-6 point lift means a measurable reduction in the “agent went off-rails and I had to intervene” failure mode.

Longer autonomous runs before a human has to step in. At 82%, an Opus 4.6 agent could handle maybe 3-5 chained tasks before drift accumulated enough that supervision became required. At 87.6%, that ceiling moves up — not to unlimited, but to a range where overnight runs on well-scoped batch jobs start being viable.

The Read:Edit ratio also holds better. Stella Laurenzo’s forensic analysis at AMD showed that when Claude was throttled in March, the ratio collapsed from 6.6 to 2.0, meaning the model was editing code without reading enough context first. 4.7 pushes it back toward the healthy range. If you’ve built monitoring on this signal, you’ll see it.

Is this worth the same price as 4.6? Yes, because it’s the same price. Would it have been worth a premium? Probably a small one. The fact that Anthropic didn’t charge one is either a competitive response to GPT-5.4’s positioning or a deliberate signal to enterprise customers still metabolizing the pricing overhaul from earlier this month. Probably both.


The 1M Context Window — Finally Default

Context window depth is one of those features that sounds marginal until you hit it.

With 1M tokens as the default, a single Claude call can now hold roughly 750,000 words — the length of a medium book, or an entire production codebase of 50-100K lines, or 2,000+ pages of legal discovery material. Opus 4.6 had this capability but behind a beta flag. 4.7 just has it on.

Practical consequences:

  • Codebases no longer need pre-slicing. If your repo is under ~100K lines of code, you can load the whole thing and let the model reason across files without manual chunking.
  • Long-form legal, financial, and research analysis gets cleaner. You’re no longer making lossy decisions about what to include in a prompt.
  • Agent scratchpads can persist longer mid-task. The tool-use trace no longer eats into a tight budget for the actual work.

The counterweight: retrieval quality at 1M tokens is genuinely good but not uniform. Models still exhibit what’s been called “lost-in-the-middle” behavior — information placed near the beginning or end of long contexts is retrieved more reliably than information buried in the middle. Opus 4.7 is better at this than 4.6 but isn’t perfect. For critical retrieval tasks, structured context (with explicit section markers) still outperforms dumping a raw 900K-token blob.

For a fuller comparison of how the frontier models handle long context, our AI models comparison for 2026 walks through Gemini 3, GPT-5.2, and Claude at various context depths.


Where Opus 4.7 Still Struggles

The honest limitations section.

Math reasoning. Opus 4.7 is strong here but not leading. Mythos’s 97.6% USAMO score isn’t a number Anthropic published for 4.7 because the comparison would be unflattering. If your workflow is heavy on formal math, Gemini 3 Pro’s thinking mode or GPT-5.4 in extended reasoning still have advantages.

Agentic computer use. Still a real capability (see our Claude computer use review) and improved from 4.6, but the reliability gap between “Claude can navigate a GUI” and “Claude reliably completes a 30-step GUI workflow unattended” hasn’t closed.

Cost at scale. The $5/$25 pricing is competitive for frontier capability but still a premium compared to Sonnet-tier or GPT-5.2-mini. For workflows that don’t need Opus-level reasoning, you’re overpaying. Our AI cost optimization guide has a framework for routing work to the right model tier.

The ceiling problem. This is the one that’s hardest to quantify. You are, by Anthropic’s own admission, not using their best. That’s fine for most work. It’s uncomfortable to think about for work where that 6-point gap matters — cybersecurity analysis, complex multi-step reasoning, edge cases in scientific computing.


Who Should Upgrade

You Should Move to 4.7 If:

  • You’re running any production Claude coding workflow. The SWE-bench lift pays for itself immediately.
  • You do long-context work (legal, financial analysis, codebase exploration) and have been managing around the 200K default.
  • You’re building multi-step agent pipelines where each step’s reliability compounds.

You Can Stay on 4.6 If:

  • You have a frozen stack with specific 4.6 behavior tuned in, and your renewal cycle is imminent — wait until you have bandwidth to retest.
  • Honestly, that’s mostly it. There’s very little reason not to migrate.

You Should Look Elsewhere If:

  • Your primary need is GUI automation or expert-level vulnerability research: GPT-5.4 Operator and the Project Glasswing cohort represent capabilities Opus 4.7 doesn’t reach.
  • You’re price-sensitive at high volume: Sonnet 4.x, Gemini 2.5 Flash, or GPT-5.2-mini will cover 70% of workloads for a fraction of the cost. See our best AI coding assistants guide for the full tradeoff matrix.

Our Take

Opus 4.7 is a clean, honest release. Better at coding, 1M context out of the box, same price. If someone asks me whether to upgrade, the answer is yes, and the whole conversation takes about ten seconds.

What’s harder is the framing Anthropic chose for launch day. Shipping “here’s our new flagship” and “here’s the model we won’t sell you” in the same news cycle is either disciplined transparency or a subtle way to make the thing you can buy feel like a consolation prize. Probably both.

I think the Mythos restriction is the right call — the capability profile genuinely is dual-use in ways that justify gating. But the commercial geometry of “pay $5/$25 for second-best” is new territory for a frontier lab, and it’s worth being honest about what that feels like. You’re buying a very good model from a company that has openly said there’s a better one.

For almost every practical use case, that doesn’t matter. 4.7 is better than what most teams were using yesterday. The price didn’t go up. The upgrade is free. The long-context default finally landed. That’s a good release.

The reason this review is four stars and not five is the ceiling. Anthropic published that there’s more on the table — and then priced it out of reach, for structural reasons that are defensible but that still mean what they mean. You’re paying for second-best because second-best is what you’re allowed to pay for.

The previous release got five stars because there wasn’t an explicit better tier sitting next to it on the product page. This one has a label on it: “Claude Opus 4.7” — right next to the fine print saying “Mythos Preview — restricted access only.” The label does what labels do.


Frequently Asked Questions

What is Claude Opus 4.7?

Claude Opus 4.7 is Anthropic’s new flagship publicly available AI model, announced April 16, 2026. It scores 87.6% on SWE-bench Verified, ships with a 1 million token context window as the default, and is priced at $5 per million input tokens and $25 per million output tokens — the same pricing as the previous Opus 4.6 release.

How does Claude Opus 4.7 compare to Opus 4.6?

Opus 4.7 delivers roughly a 5-6 point improvement on SWE-bench Verified over 4.6, makes the 1M context window default instead of beta, and holds pricing flat at $5/$25 per million tokens. For most production coding and long-context workflows, it’s a strict upgrade at no additional cost.

Is Claude Opus 4.7 better than Claude Mythos?

No. Anthropic’s own benchmarks place Mythos Preview at 93.9% SWE-bench Verified and 97.6% USAMO, meaningfully ahead of Opus 4.7. However, Mythos is restricted to Project Glasswing enterprise partners due to its expert-level vulnerability exploitation capability. Opus 4.7 is the highest-capability model Anthropic makes available for general commercial use.

How much does Claude Opus 4.7 cost?

Opus 4.7 is priced at $5 per million input tokens and $25 per million output tokens via Claude API, Amazon Bedrock, Google Cloud Vertex AI, and Microsoft Foundry. For Claude Pro users, 4.7 is included in the $20/month consumer subscription. Enterprise customers pay per-seat plus standard API rates under the new pricing model.

What is the Claude Opus 4.7 context window?

Opus 4.7 supports 1 million tokens (approximately 750,000 words or roughly 2,000 pages of text) as its default context window. This is the same maximum as Opus 4.6 offered, but 4.6 required a beta flag to enable it. In 4.7, 1M is the default behavior for all API calls.

Should I upgrade from Opus 4.6 to Opus 4.7?

Yes, for nearly every use case. The upgrade is a model ID change with no breaking behavior differences, no price increase, and measurable improvement on coding tasks and long-context work. The only reason to delay is if you have a production stack with narrowly-tuned 4.6 behavior that needs retesting before rollover.

Why can’t I access Claude Mythos if it’s a better model?

Access to Mythos is restricted to Project Glasswing, a controlled-access program for vetted enterprise partners in cybersecurity, cloud infrastructure, and financial services. Anthropic’s stated reason is that Mythos can find and exploit software vulnerabilities at a level that creates real-world risk if released broadly. Our full Mythos analysis covers the capability profile and the regulatory response.


Last updated: April 18, 2026. Sources: Anthropic: Claude Opus 4.7 announcement ¡ Anthropic: Project Glasswing / Mythos documentation ¡ GitHub: Claude Code issue #42796 ¡ The Register: Claude outage and quality analysis.

Related reading: Claude Opus 4.6 Review · Claude Mythos: Too Dangerous to Release · Claude’s Hidden Performance Cut · Best AI Coding Assistants 2026 · Anthropic vs OpenAI 2026