Speed is becoming a serious competitive differentiator in AI. The fastest AI models are no longer just a nerdy benchmark curiosity — they determine which tools feel snappy enough to use in real workflows and which feel like they’re making you wait. Data from the Artificial Analysis index puts numbers to the difference, and the gap between the top and bottom of the chart is significant.

Faastest AI Models
1. GPT-oss 20B (High): 306 Tokens Per Second
OpenAI’s open-source GPT-oss 120B on the high-compute tier outputs 306 tokens per second, which puts it meaningfully ahead of the rest of the field. Among the fastest AI models available today, this one sets the pace — though it’s worth noting that “high” tier pricing tends to reflect that performance.
2. GPT-oss 20B (High): 239 Tokens Per Second
The smaller variant of the OpenAI’s open model series, GPT-oss 20B, clocks in at 239 tokens per second. OpenAI’s dominance at the top of the speed chart reflects its infrastructure scale. As companies reassess their AI spending, speed-per-dollar is becoming a metric that matters as much as raw output.
3. Google Gemini 3.5 Flash: 212 Tokens Per Second
Google’s recently released Gemini 3.5 Flash hits 212 tokens per second, making it one of the fastest AI models from a major lab not named OpenAI. Gemini 3.5 is by far the most capable model on the list, putting it in a good place for someone looking for speed and quality. The model is especially good at agentic tasks, while being priced slightly more than Gemini 3.1 Pro.
4. Alibaba Qwen3.7 Max: 211 Tokens Per Second
At 211 tokens per second, Alibaba’s Qwen3.7 Max is essentially neck-and-neck with Gemini 3.5 Flash. The fact that a Chinese model sits this high among the fastest AI models in a global benchmark is notable — Alibaba has been quietly building one of the more competitive model families outside the US-UK axis.
5. xAI Grok 4.3 (High): 190 Tokens Per Second
Elon Musk’s xAI lands in fifth with Grok 4.3 at 190 tokens per second. It’s a respectable number, though trailing the leaders by a meaningful margin. For users who prioritize speed, Grok sits in the middle of the fastest AI models pack — competitive, but not the first choice if throughput is the primary concern.
6. OpenAI GPT-5.4 Mini (xHigh): 173 Tokens Per Second
The GPT-5.4 Mini on the extra-high tier manages 173 tokens per second. Mini models are typically optimized for cost, so hitting this speed at a smaller footprint is the point. For businesses deploying AI at scale, mini-class fastest AI models often deliver the best ROI on a per-token basis.
7. NVIDIA Nemotron 3 Super: 153 Tokens Per Second
NVIDIA enters the model race — not just chips — with Nemotron 3 Super at 153 tokens per second. It’s an interesting signal that the company best known for making the hardware that runs the fastest AI models is now competing with the models themselves.
8. Mistral Medium 3.5: 152 Tokens Per Second
France’s Mistral clocks 152 tokens per second with Medium 3.5. Mistral has carved out a reputation for efficient, open-weight models, and sitting near the middle of this chart while competing with much larger labs is a reasonable result. The fastest AI models from European labs remain a smaller slice of the market, but Mistral continues to hold its ground.
9. Google Gemini 3.1 Pro Preview: 137 Tokens Per Second
Google’s pro-tier preview model also makes the list with 137 tokens per second. Pro models prioritize quality; Flash models prioritize speed. For teams that need both, the fastest AI models from Google sit at different ends of that tradeoff.
10. AWS Nova 2.0 Pro Preview (Medium): 134 Tokens Per Second
Amazon rounds out the top ten with Nova 2.0 Pro Preview at 134 tokens per second. AWS entering the frontier model race matters for enterprise buyers already deep in the Amazon ecosystem. As AI infrastructure spending accelerates globally, AWS having competitive fastest AI models on its own platform reduces vendor lock-in friction.
What The Rankings Tell Us
A few things stand out from this chart. First, OpenAI holds the top two slots by a wide margin — 306 and 239 tokens per second versus 212 for the third-place model. Second, the middle of the pack (190–153 tokens per second) is surprisingly crowded, with xAI, OpenAI mini, NVIDIA, and Mistral all bunched together. Third, Chinese models are genuinely competitive — Qwen3.7 Max at 211 tokens per second isn’t a consolation entry.
Speed alone doesn’t make a model worth using. But as enterprises get more deliberate about AI ROI, the fastest AI models that can also deliver quality output will have a real edge. The benchmark gap between the top and bottom of this list — 306 versus 134 tokens per second — is more than 2x. In a workflow that touches AI hundreds of times a day, that difference adds up fast.