Anthropic Says AI Agents Found $4.6 Million Of Exploits In Simulated Blockchain Smart Contracts

AI agents haven’t yet gone mainstream, but they’re showing serious potential of real-world impacts in experimental scenarios.

In a new research project, Anthropic has demonstrated that its Claude AI models can autonomously identify and exploit vulnerabilities in blockchain smart contracts worth millions of dollars—at least in simulation. The company’s most capable models collectively discovered exploits worth $4.6 million across contracts that were compromised in 2025, establishing what researchers call a concrete lower bound for the economic harm these AI capabilities could enable.

The research, conducted through Anthropic’s MATS and Fellows programs, introduces SCONE-bench, a benchmark comprising 405 smart contracts with real-world vulnerabilities exploited between 2020 and 2025. Smart contracts are programs deployed on blockchains like Ethereum that handle financial transactions entirely through software, making them an ideal testing ground for measuring AI exploitation capabilities in dollar terms rather than arbitrary success rates.

Anthropic’s findings reveal a striking acceleration in AI cyber capabilities. The company’s Claude Opus 4.5 model successfully exploited 50% of the 34 contracts compromised after March 2025, corresponding to $4.5 million in simulated stolen funds. Across all tested models, exploit revenue doubled roughly every 1.3 months over the past year, driven by improvements in what researchers describe as agentic capabilities like tool use, error recovery, and long-horizon task execution.

Perhaps most concerning, the research went beyond retrospective analysis. When tested against 2,849 recently deployed contracts with no known vulnerabilities, both Claude Sonnet 4.5 and OpenAI’s GPT-5 independently uncovered two novel zero-day exploits worth $3,694 in simulated revenue. GPT-5 achieved this at an API cost of just $3,476, demonstrating what Anthropic calls proof-of-concept that profitable, real-world autonomous exploitation is technically feasible today.

The economics of AI-powered exploitation are becoming increasingly favorable for attackers. Anthropic found that the average cost per agent run was just $1.22, while the median number of tokens required to produce a successful exploit has declined by 70% across four generations of Claude models. In practical terms, an attacker today can obtain roughly 3.4 times more successful exploits for the same compute budget as six months ago.

One of the novel vulnerabilities discovered involved a token contract where developers forgot to add a read-only modifier to a public calculator function, inadvertently giving it write permissions. The AI agent exploited this flaw to repeatedly inflate its token balance before selling the tokens for approximately $2,500 in simulated profit. At peak liquidity, this vulnerability could have yielded nearly $19,000. After Anthropic’s coordination with blockchain security firm SEAL, an independent white-hat hacker recovered the vulnerable funds and redistributed them to their rightful owners.

The second zero-day exploit involved a token launch service that failed to validate fee recipient addresses, allowing anyone to withdraw trading fees meant for legitimate beneficiaries. Four days after the AI agent’s discovery, a real attacker independently exploited the same flaw and stole approximately $1,000.

Anthropic emphasized that all testing occurred in blockchain simulators with no impact on real-world assets. The company is releasing the benchmark publicly despite dual-use concerns, arguing that attackers already have strong financial incentives to build these tools independently, while open-sourcing the benchmark gives defenders the ability to stress-test and fix contracts before exploitation occurs.

The research has implications extending far beyond blockchain. The same capabilities that enable agents to exploit smart contracts—including long-horizon reasoning, boundary analysis, and iterative tool use—apply to all kinds of software. As one researcher involved in the project noted, open-source codebases may face the first wave of automated scrutiny, but proprietary software is unlikely to remain unstudied for long as agents improve at reverse engineering.

This is the latest in a growing list of examples of agents finding vulnerabilities in software systems. In July this year, Google had said that its cybersecurity AI agent, named ‘Big Sleep’, had helped it prevent a security incident. “We believe this is a first for an AI agent – definitely not the last – giving cybersecurity defenders new tools to stop threats before they’re widespread,” Google CEO Sundar Pichai had said. Last month, Anthropic said Chinese hackers had used its coding tools to run espionage operations.

And with the new disclosure, Anthropic’s message to the security community is clear: the same agents capable of exploiting vulnerabilities can also be deployed to patch them. The company argues that now is the time for defenders to adopt AI for defense, as the window between vulnerable code deployment and exploitation continues to shrink with each advancement in AI capabilities.

Posted in AI