Law Professors Prefer AI Answers Over Those Of Their Peers, Finds Study

AI is fast eclipsing the abilities of the top people in some of the highest-paid professions.

A new study by researchers from Stanford and other leading U.S. law schools has delivered striking evidence of this shift in one of the most demanding professional domains: legal education. Titled “Law Professors Prefer AI Over Peer Answers,” the paper finds that when law professors were asked to blindly choose between short-answer responses written by their colleagues and those generated by large language models (LLMs), they overwhelmingly preferred the AI versions.

The study, published May 27, 2026, involved sixteen contracts law professors from fourteen U.S. law schools who all teach from the same casebook. Participants first created 40 representative office-hours-style questions across categories like case recall, doctrine, hypotheticals, and policy. They then wrote their own answers and judged 2,918 anonymized pairwise comparisons between human and LLM responses.

Clear Preference for AI

Professors rated responses from Google’s Gemini 2.5 Pro at a 75.92% win rate against human instructors, while NotebookLM (a retrieval-augmented version grounded in the casebook) achieved 74.75%. The models performed on par with the strongest human participants, and in some analyses, even outperformed every instructor. Every single judge in the study preferred LLM answers over peer responses on average, with a median LLM-preference rate of 75.81%.

Notably, the advantage held across all question types—including complex hypotheticals and policy questions that require nuanced judgment rather than rote recall. AI responses were also flagged as pedagogically harmful far less often (3.53% pooled rate) compared to professor-written answers (12.06% average).

The researchers went further by engineering textual features such as length, clarity, structure, and pedagogical support to test whether surface-level polish explained the results. It didn’t. LLMs consistently outperformed predictions based on these features alone, suggesting the advantage stems from substantive reasoning quality.

Shared Professional Standards

To determine whether this reflected genuine alignment with expert standards or mere stylistic appeal, the team analyzed inter-judge agreement on overlapping trials. Agreement exceeded what would be expected from purely idiosyncratic preferences, indicating that LLMs were capturing latent professional norms that the professors themselves endorse.

Using an “LLM-as-judge” framework validated against human evaluators, the researchers extended the ranking to newer models. Claude Opus 4.7 topped the list, followed by other frontier systems. All outperformed human instructors. Reasoning-focused variants, such as Gemini 2.5 Flash with thinking budget, significantly outperformed non-reasoning counterparts.

Implications for Legal Education and Beyond

The findings challenge assumptions about AI’s limitations in high-judgment fields. While many prior evaluations focused on objective accuracy, this study tested AI against the subjective but shared standards of expert practitioners—the very essence of legal training.

For law schools facing instructor capacity constraints, the results point to a practical opportunity: always-available AI tutors that can deliver high-quality short answers aligned with professional expectations. The authors suggest implementations with clear guardrails, citations to source material, and escalation paths to human faculty.

The paper also highlights a curious detail: stock Gemini 2.5 Pro often outperformed RAG-grounded variants (including a commercial AI tutor built on the same base model), raising questions about context dilution in long-document retrieval.

As AI capabilities continue to advance rapidly, this research underscores a broader trend. In domains where success depends on reasoned judgment rather than single ground truths, frontier models are not just matching experts—they are frequently preferred by them. For businesses, technologists, and educators, the message is clear: the integration of AI into professional knowledge work is accelerating, even in fields long considered resistant to automation.