Claude Code Found A Linux Vulnerability That Had Remained Hidden For 23 Years, Says Anthropic Researcher

AI isn’t just getting really good at coding, but it’s also able to find decades-old bugs in systems designed by some of the best engineers on the planet.

Nicholas Carlini, a research scientist at Anthropic, revealed at the [un]prompted AI security conference that he used Claude Code to uncover multiple remotely exploitable security vulnerabilities in the Linux kernel — including one that had gone undetected for 23 years. The finding is a signal that AI has crossed a meaningful threshold in cybersecurity: not merely assisting human researchers, but independently reasoning through some of the most complex software ever written.

The Bug Claude Found

The vulnerability Carlini highlighted lives in Linux’s network file share (NFS) driver and allows an attacker to read sensitive kernel memory over the network. What makes it notable isn’t just its severity — it’s the depth of protocol understanding required to find it.

The attack involves two cooperating NFS clients. Client A acquires a file lock and declares an unusually large (but technically legal) 1,024-byte owner ID. When Client B then tries to claim the same lock, the server denies the request and generates a response — but encodes that response into a buffer only 112 bytes in size. The full denial message, including the owner ID, comes to 1,056 bytes. The kernel writes 1,056 bytes into a 112-byte buffer, letting an attacker overwrite kernel memory with bytes they control.

The bug was introduced in March 2003, predating Git itself. It survived two decades of expert review, fuzzing tools, and static analysis — none of which could catch it because none of them understood the NFS protocol well enough to see the implication.

Claude did.

How Little It Took

What’s striking is how minimal the human involvement was. Carlini essentially pointed Claude Code at the Linux kernel source and asked it to find vulnerabilities, using a simple script that looped over every source file and told Claude to focus on each one in turn — a way of preventing the model from fixating on the same bug repeatedly. No hand-holding. No curated hints. Claude generated the full bug report, including the ASCII protocol diagrams that illustrated the attack chain.

Carlini put it plainly at the conference:

“We now have a number of remotely exploitable heap buffer overflows in the Linux kernel. I have never found one of these in my life before. This is very, very, very hard to do. With these language models, I have a bunch.”

A Bottleneck of Human Triage

If anything, the constraint is no longer Claude’s ability to find bugs — it’s humans’ ability to verify them fast enough. Carlini says he has hundreds of additional crash reports he hasn’t been able to validate yet, and won’t send unverified findings to Linux kernel maintainers. The bottleneck, in other words, has inverted: AI is producing faster than humans can review.

Five vulnerabilities from Carlini’s work have been fixed or reported to kernel maintainers so far, including the NFS heap overflow, an out-of-bounds read in io_uring, and two separate bugs in the ksmbd SMB server implementation.

A Rapidly Widening Capability Gap

The improvement in AI’s vulnerability-finding ability over just a few months is steep. Carlini tested the same workflow on older models and found that Claude Opus 4.1 and Sonnet 4.5 — released eight and six months ago respectively — could find only a small fraction of what Opus 4.6 surfaces. The chart from his talk tells a sharp story: earlier models barely register, while Opus 4.6 finds substantially more bugs than even its immediate predecessor.

This is part of a broader pattern. Anthropic has used Opus 4.6 to find over 500 zero-day vulnerabilities across production open-source codebases, including 22 in Firefox over two weeks — a browser that has been among the most rigorously tested codebases in existence. One Firefox vulnerability was flagged within 20 minutes of Claude being pointed at the current codebase.

Separately, OpenAI’s o3 model discovered a use-after-free vulnerability in the Linux kernel’s SMB implementation by analyzing over 12,000 lines of code and identifying a race condition that traditional static analysis tools consistently missed. An AI security startup called AISLE was credited with finding all 12 zero-day vulnerabilities in OpenSSL’s January 2026 security patch, including a high-severity stack buffer overflow potentially exploitable without valid key material.

The Linux kernel’s own lead maintainer, Greg Kroah-Hartman, has publicly noted the shift. Months ago, AI-generated security reports were largely noise — what developers called “AI slop.” Then, roughly a month ago, the quality changed. “Something happened a month ago, and the world switched,” he said. “Now we have real reports.”

The Dual-Use Problem

The same capability that makes Claude useful for defenders makes it useful for attackers. Anthropic acknowledges this directly: if the gap between finding and exploiting vulnerabilities narrows, the risk calculus changes significantly. For now, Claude is substantially better at discovering bugs than at weaponizing them — a meaningful but not permanent asymmetry.

To act on this window, Anthropic has launched Claude Code Security, a research preview that scans codebases for vulnerabilities and suggests patches for human review. It’s available to Enterprise and Team customers, with expedited access for open-source maintainers.

The broader context matters here too. Claude Code is now 100% written by Claude Code — a recursive loop that’s accelerating the tool’s own capabilities. Claude Code accounts for roughly 4% of all public GitHub commits. The same model that finds kernel vulnerabilities is also autonomously building C compilers and writing fraud detection systems from scratch.

What Comes Next

The security community has been aware, in the abstract, that AI would eventually reshape vulnerability research. What Carlini’s work shows is that the transition isn’t gradual — it’s already happened. The tools exist now. The bugs are being found now. The question of who uses them first — defenders or attackers — is one that organizations can no longer treat as theoretical.

Carlini’s closing note from the conference was unambiguous: “I expect to see an enormous wave of security bugs uncovered in the coming months, as researchers and attackers alike realize how powerful these AI models are at discovering security vulnerabilities.”

For security teams, that wave is already at the shore.

Posted in AI