Claude Opus 4.8 Is Better Than Opus 4.7 But Not As Good As Mythos Preview, Says Anthropic

Anthropic has released a new model, but maintains it still isn’t as capable as Claude Mythos, which was only released to a small set of institutions under Project Glasswing.

Claude Opus 4.8 improves on Opus 4.7 across nearly all of the company’s internal benchmarks — spanning software engineering, reasoning, agentic tasks, and multimodal capabilities. But Anthropic is careful to frame this as a step forward within a defined ceiling: the model does not surpass Claude Mythos Preview, the frontier model that Anthropic restricted to select partners under Project Glasswing earlier this year. That distinction matters, especially given what Mythos Preview can do.

Biological and Virology Risk: Bounded, But Not Zero

An interesting of the model card covers biological risk. Anthropic ran a battery of automated evaluations — long-form virology tasks, a Virology Capabilities Test (VCT), and DNA Synthesis Screening Evasion — comparing Opus 4.8 against both its predecessor and Mythos Preview.

The numbers show Opus 4.8 consistently scoring higher than Mythos Preview on several sub-tasks. On the DNA Synthesis Screening Evasion evaluation, for instance, Opus 4.8 scored 0.30 on Criterion 1 against Mythos Preview’s 0.842 — a significant gap where a lower score reflects less ability to evade biosecurity screening, which is the safer outcome. On the Virology Capabilities Test, Opus 4.8 scored 0.470 versus Mythos Preview’s 0.574.

Anthropic’s framing here is deliberate: the company says its “overall conclusion is that Opus 4.8 does not advance the capability frontier beyond our most capable model.” In other words, the ceiling for catastrophic biological risk is still set by Mythos Preview, not by Opus 4.8. That’s a meaningful distinction for Anthropic’s Responsible Scaling Policy (RSP) commitments — the company is essentially saying this model doesn’t trigger a new tier of concern.

On the chemistry side, Anthropic did not conduct fresh red-teaming for Opus 4.8. Instead, it argues that since Opus 4.8 doesn’t exceed Mythos Preview on automated biological and organic chemistry evaluations, the chemical risk profile is “bounded by prior findings.” The company maintains blocking classifiers for high-priority chemical weapons content and monitoring for chemical risks.

Cyber: More Capable Unshackled, Roughly Equal With Guardrails

On cybersecurity benchmarks — some used in a system card for the first time — Opus 4.8 without safeguards is modestly more capable than Opus 4.7, the model Anthropic launched alongside Claude Design last month. With safeguards applied, the two models perform comparably. Both remain substantially weaker than Mythos Preview on cyber tasks — which is worth noting given that Mythos Preview’s autonomous zero-day discovery capability was precisely what prompted OpenAI to fast-track GPT-5.5-Cyber.

Alignment: Cleaner Than Its Predecessor

Anthropic says Opus 4.8 is an improvement over Opus 4.7 on most alignment measures, and shows a profile close to Mythos Preview on these dimensions. That’s the more encouraging headline in the model card. In an environment where Claude Opus 4.6 was caught decrypting benchmark answer keys to game its own evaluations, meaningful alignment progress is not something to gloss over.

What This Means for Enterprises

For most enterprise customers, the operational takeaway is straightforward: Opus 4.8 is a better, same-priced upgrade over 4.7, with meaningful gains in agentic coding, multi-agent workflows, and long-context tasks. The safety story is essentially: it’s more capable than the last model, but the risk ceiling hasn’t moved. But Claude Mythos remains Anthropic’s most powerful offering, and it may be a while before it’s made widely available to businesses and individuals.

Posted in AI