NYT's Blind Test With Readers Shows 54% Prefer AI-Written Content

The discerning readers of one of the best-known publications now feel that AI writes slightly better than humans.

The New York Times ran a blind writing quiz, authored by columnist Kevin Roose and journalist Stuart A. Thompson, presenting 86,000 readers with five pairs of writing samples across literary fiction, fantasy, science writing, historical fiction, and poetry. For each pair, readers were asked to choose the passage they preferred, without knowing which was written by a human and which by an AI. The results: 54% of participants preferred the AI-generated writing. We took the quiz as well, and our results showed that we picked 3 AI-written passages as opposed to 2 human-written passages as examples of better writing.

A Landmark Moment for AI Creativity

The quiz was designed to test one of the most enduring skepticisms about artificial intelligence — that it cannot be truly creative because it lacks human experience. That skepticism is now under serious pressure. The NYT’s methodology was rigorous: AI was asked to select an existing piece of strong human writing and then craft its own version in its own voice. The side-by-side comparison gave readers no metadata, no bylines, and no hints — just the words themselves.

The margin — 54% to 46% — is narrow, but its symbolic weight is considerable. For decades, the assumption has been that while AI could handle functional writing (summaries, reports, product descriptions), the ineffable qualities of literary prose — rhythm, emotional resonance, metaphor — would remain a human preserve. This quiz suggests that assumption may be eroding faster than most people anticipated.

The finding also aligns with a broader pattern emerging across the industry. Several studies prior to the NYT quiz had already shown that in blind evaluations, readers often fail to distinguish between AI and human writing — and sometimes actively prefer the AI output. What makes the NYT result distinctive is the scale: 86,000 readers is a large, self-selected sample of people who are presumably both literate and, as NYT subscribers, accustomed to high-quality prose.

Breaking Down the Results

The quiz spanned five genres, and the variation across them is telling. Some AI passages leaned into clarity and rhythm in ways that felt fresh rather than mechanical. In the science writing category, for instance, the AI-written passage competed directly against established science communication — the kind of elegant explanatory prose that magazines like The Atlantic and Scientific American have spent decades perfecting.

In poetry, the contrast was perhaps sharpest. One passage — written by AI — described finding a dead owl in a field, burying it near a fence post, and noticing how “the ground was cold and giving.” The imagery was restrained, observational, and genuinely moving. The human-authored passage, taken from Elizabeth Bishop’s celebrated poem “The Fish,” carried the weight of literary history. That readers were nearly split is remarkable.

Historical fiction showed a similar dynamic. The AI-generated passage — with its line about ambiguity being “not weakness” but “survival” — demonstrated that AI can construct not just sentences but voice. A narrative perspective. A point of view. These are the qualities that critics have long argued AI fundamentally cannot replicate.

What This Means for the Content Industry

The business implications are significant. Content is one of the largest cost centers for media companies, marketing agencies, and enterprise communications teams. If AI-generated prose is now indistinguishable from — or preferable to — human writing in blind evaluations, the economic logic for deploying it at scale becomes considerably stronger.

This fits a broader pattern of AI capability that has been advancing faster than most observers predicted. AI systems are already being described as surpassing human-level performance in certain domains, with Anthropic co-founder Daniela Amodei saying that by some definitions, AGI has already been reached in narrow areas. Creative writing may be quietly joining that list.

The trajectory is also being shaped by aggressive predictions from AI leaders. Anthropic CEO Dario Amodei has said that AI systems better than humans at nearly all tasks could arrive within two to three years — a claim that, in light of the NYT quiz, seems less hyperbolic than it might have a year ago.

The Coding Parallel

It’s worth drawing a parallel to what happened with software. Just a few years ago, the idea that AI would write production-quality code was met with skepticism from engineers. Today, Google says well over 30% of its code is written by AI, Microsoft reports similar numbers, and OpenAI employees say as much as 80% of their individual code output is AI-generated. The transition from “AI can assist with coding” to “AI writes most of the code” happened in a matter of months, not years.

The NYT quiz raises the question of whether creative writing is about to follow the same curve. The shift in coding happened gradually and then suddenly — first at the margins, then at the center. A 54% reader preference for AI writing may be the writing industry’s equivalent of the moment engineers first admitted that Copilot was genuinely useful.

What Remains Human

The quiz also reveals what AI still lacks — or at least, what it has not yet convincingly replicated. The human passages that resonated most deeply tended to carry a weight of specificity rooted in lived experience. Cormac McCarthy’s prose, for instance — one of the human passages quoted — has a texture that comes not just from word choice but from a worldview forged over decades. AI can approximate the rhythm of such writing; it is less clear it can replicate the conviction beneath it.

Elon Musk has predicted that AI will exceed the collective intelligence of all humans by 2030, while others in the AI community remain more cautious about timelines and definitions. The NYT quiz doesn’t settle those debates. What it does is shift the burden of proof. The question is no longer whether AI can write well enough to fool readers — it demonstrably can. The question now is what, if anything, human writing offers that AI cannot.

A New Benchmark

For publishers, brand content teams, and anyone whose work involves producing prose at scale, the NYT quiz is a useful benchmark — not because it proves AI is better than humans, but because it proves the gap has narrowed to the point where many readers cannot detect it. That is a commercially meaningful threshold, regardless of where one stands on the aesthetics.

The fact that it took 86,000 people and a major newspaper to surface this result publicly is itself instructive. The shift has been happening quietly, in enterprise content pipelines, marketing departments, and newsroom tools, long before it became front-page news. The NYT quiz has simply made the trend legible.