The legal and government sectors have always demanded precision, reliability, and nuanced reasoning — exactly the traits that separate the best AI models for legal work from general-purpose chatbots. Arena.ai’s Legal & Government leaderboard, which ranks models based on real-world human preference votes in domain-specific use cases, now offers one of the clearest performance signals available. Here’s a breakdown of every model in the current top 10, what they cost, and what makes them worth considering for legal and government teams.

1. claude-opus-4-6-thinking — Rank #1
With an Arena score of 1513 (±13) and 2,627 votes, claude-opus-4-6-thinking sits at the top of the legal and government leaderboard. It’s the thinking variant of Claude Opus 4.6, Anthropic’s flagship model — one that made headlines for its sophisticated reasoning behaviour. The best AI models for legal use require extended deliberation on ambiguous questions, and the thinking mode is purpose-built for exactly that. Priced at $5 input / $25 output per million tokens, it sits at the premium end but delivers frontier performance. A rank spread of 1–10 means it can fluctuate in placement, but it has consistently held the top position.
2. muse-spark — Rank #2
Meta’s Muse Spark is the breakout entry on this list. Scoring 1511 (±21) with a Preliminary tag and 876 votes, it ranks second despite being a relatively new model from Meta Superintelligence Labs. Notably, pricing is listed as N/A — there’s no public API yet, with access currently limited to select partners. For legal and government evaluators considering the best AI models for legal workflows, Muse Spark’s strong reasoning capabilities (powered by its Contemplating mode, which orchestrates parallel agents) are compelling. The preliminary tag means scores may shift as more votes accumulate, so keep watching this one.
3. claude-opus-4-6 — Rank #3
The non-thinking version of Anthropic’s Opus 4.6 holds the third spot with a score of 1509 (±12) — the tightest confidence interval in the top five — and the highest vote count in the top 10 at 2,733. That large vote base makes this one of the most statistically reliable rankings on the board. At $5 / $25 per million tokens, it matches the thinking variant in price but offers faster, more predictable responses. Legal teams that need the best AI models for legal document drafting, contract review, or regulatory research — without extended reasoning latency — will find this a strong default choice.
4. claude-opus-4-7-thinking — Rank #4
Scoring 1508 (±16) with 1,555 votes, claude-opus-4-7-thinking is the thinking variant of Claude Opus 4.7, Anthropic’s most recent flagship before the 4.8 release. It improves on Opus 4.6 across long-running agentic tasks, instruction-following, and vision — capabilities directly relevant to legal research and document analysis pipelines. Priced at $5 / $25 per million tokens, it sits just behind Opus 4.6 thinking on the legal leaderboard. For organisations building the best AI models for legal workflows into automated pipelines, the improved agentic consistency of the 4.7 generation is worth the attention.
5. gemini-3-pro — Rank #5
Google’s gemini-3-pro scores 1502 (±11) with the highest vote count on this page — 2,907 — which gives the result a solid statistical foundation. It’s priced significantly lower than the Anthropic models at $2 input / $12 output per million tokens, making it one of the most cost-efficient entries in the top five. Gemini 3.1 Pro has been deployed in real-world autonomous settings, demonstrating practical reliability beyond benchmarks. For government agencies evaluating the best AI models for legal compliance and policy analysis at scale, the price-to-performance ratio here is hard to ignore.
6. gemini-3.1-pro-preview — Rank #6
The preview variant of Google’s updated Gemini 3.1 Pro scores 1499 (±11) and — with 3,326 votes — has the most human preference data of any model on this list, making it the most robustly evaluated entry. At $2 / $12 per million tokens, it matches the pricing of gemini-3-pro. Legal teams looking for the best AI models for legal research across large document sets will appreciate Gemini’s extended context window capabilities, which allow entire case archives or regulatory frameworks to be processed in a single prompt. The preview tag suggests it may improve further as Google continues refinements.
7. claude-opus-4-7 — Rank #7
The standard (non-thinking) version of Claude Opus 4.7 scores 1488 (±15) with 1,647 votes and is priced at $5 / $25 per million tokens. With a notably wide rank spread of 1–49, this model shows more variability than others on the list — likely because it’s newer and the vote base is still building. For legal teams already using Claude 4.6, upgrading to 4.7 brings improvements in instruction-following and vision that are relevant for contract analysis and document comparison tasks. It remains among the best AI models for legal agentic work, particularly in multi-step automated workflows.
8. gemini-3-flash — Rank #8
gemini-3-flash scores 1487 (±13) with 2,313 votes — and at $0.50 input / $3 output per million tokens, it’s by far the most affordable model in the top 10. For government procurement offices, public sector legal teams, or law firms running high-volume document processing, flash-class models represent the best AI models for legal tasks where cost per query matters as much as raw capability. The score difference between Gemini 3 Flash and the top model is just 26 points — a remarkably small gap for a model priced at one-tenth of the premium tier.
9. claude-opus-4-5-20251101 (thinking) — Rank #9
This entry — a dated checkpoint of Claude Opus 4.5 with thinking enabled — scores 1485 (±12) with 2,672 votes at $5 / $25 per million tokens. The high vote count makes this a statistically solid result, and it’s notable that an older model checkpoint remains competitive with newer releases. Anthropic’s Claude Opus 4.5 made headlines for outperforming all human candidates on a performance engineering hiring exam — a signal of the model’s professional-grade reasoning. For teams locked into specific API versions for compliance or reproducibility reasons, this is one of the best AI models for legal workflows with a frozen, auditable release.
10. gpt-5.4-high — Rank #10
OpenAI rounds out the top 10 with gpt-5.4-high, scoring 1485 (±14) with 2,161 votes and priced at $2.50 input / $15 output per million tokens. GPT-5.4 launched in March 2026, and the “high” suffix indicates an extended reasoning variant. With a rank spread of 2–51, there’s meaningful variability in where this model lands across different query types, which is worth factoring in for specialised legal use cases. It remains among the best AI models for legal teams already embedded in the OpenAI ecosystem via Microsoft Copilot or Azure OpenAI Service.
Key Takeaways
The legal and government leaderboard paints a clear picture: Anthropic’s Claude family dominates the top positions by raw score, but Google’s Gemini models offer a compelling cost-performance trade-off — especially Gemini 3 Flash. Meta’s Muse Spark is a credible entrant that will be worth revisiting once a public API is available. For any organisation evaluating the best AI models for legal and government use in 2026, the top 10 are far closer together than the rankings might suggest — 28 points separate #1 from #10. Context window, pricing structure, deployment flexibility, and data residency requirements will often matter more than marginal score differences when making a final call.