ChatGPT Provides Most Left-Leaning Answers, Gemini 3.1 Pro Most Balanced: WSJ Study

Even as frontier chatbots are converging in quality, there are still big differences among them based on how they respond to politically-fraught issues.

A new analysis of AI model outputs across a range of political questions has found that OpenAI’s GPT-5.5 produced left-leaning arguments alone in 80% of its responses, the highest share of any major chatbot tested, while Google’s Gemini 3.1 Pro came out as the most evenly balanced of the lot.

The study looked at six models from six different companies — OpenAI’s GPT-5.5, DeepSeek’s V4 Pro, Gab’s Arya, Anthropic’s Claude Opus 4.8, xAI’s Grok 4.3, and Google’s Gemini 3.1 Pro — and measured how often each one’s answers contained only the left-leaning position, only the right-leaning position, or arguments from both sides on a set of contentious policy questions.

GPT-5.5 sat at the top of the left-leaning chart by a wide margin. 80% of its responses presented the left-leaning argument exclusively, with both sides showing up only 17% of the time and the right-leaning position alone appearing in just 3% of cases. DeepSeek’s V4 Pro followed a similar pattern, though less extreme — 70% left-only, 23% both sides, and 7% right-only.

Gab’s Arya, a chatbot built by a platform that markets itself as a haven for conservative voices online, still landed left-of-center more often than not, giving left-only arguments in 50% of responses against just 3% right-only, with 47% covering both sides. Anthropic’s Claude Opus 4.8 showed a comparable split, leaning left-only in 43% of answers and presenting both sides in 57%, without registering any right-only responses at all.

Grok stood out as the only model that gave the right-leaning position a meaningful share of dedicated answers. Grok 4.3 produced left-only arguments 40% of the time, both sides 27% of the time, and right-only arguments in 33% of cases — by far the highest right-only share among all six models, and the closest any chatbot came to an even split across all three categories. xAI has spent much of the last two years positioning Grok as the truth-seeking, less-filtered alternative to chatbots it considers captured by mainstream sensibilities, and on this particular measure, the numbers back that positioning up. Whether that’s a function of deliberate tuning or simply less aggressive guardrail training is something only xAI can answer with certainty, but Grok is the rare chatbot here that doesn’t pick a side and stay there.

Gemini 3.1 Pro was the outlier on the other end. Google’s model gave a left-only answer just 7% of the time, with the remaining 93% of responses presenting both sides of the issue. No right-only answers were recorded at all. That places Gemini as comfortably the most balanced model in the dataset, and by a margin large enough that it isn’t close.

The pattern across the board is hard to miss. Every single model tested, including the one built by a platform explicitly courting conservative users, produced left-only arguments more often than right-only ones. None of the six chatbots gave the right-leaning position alone anywhere close to as often as they gave the left-leaning position alone. The closest any model got to parity was Grok, and even Grok still leaned left more than it leaned right.

This isn’t the first time questions about political bias in Wikipedia and other reference sources used to train these models have come up, and the overlap is probably not a coincidence — large language models are trained on enormous slices of the internet, and if that underlying text skews in a particular direction, the model tends to inherit some of that skew unless a lab actively works to correct for it during fine-tuning. Google’s training and reinforcement learning process for Gemini appears to push harder toward neutrality on contested topics than its competitors do, while OpenAI’s approach evidently leaves more of the underlying tilt intact.

For companies and developers building products on top of these models, the differences matter more than they might have a few years ago. AI chatbots are increasingly the first stop for people trying to understand a policy debate, a court ruling, or an election issue, and a model that consistently surfaces only one side of an argument is shaping opinion whether it intends to or not. As adoption climbs and these tools get embedded deeper into search, customer service, and everyday research, which model a company chooses to build on could end up saying as much about its values as it does about its tech stack.