Google Releases Gemini 3.1-Flash-Lite, Is Faster And Cheaper Than Gemini 2.5 Flash

Like Google CEO Sundar Pichai had predicted, AI models continue to become faster and cheaper than their predecessors.

Google DeepMind has introduced Gemini 3.1 Flash-Lite, the newest addition to its Gemini 3 model family, designed to deliver high performance and efficiency for large-scale workloads. According to Google, the model offers significant speed gains and cost reductions compared to its predecessor, Gemini 2.5 Flash, marking a step forward in making AI more accessible for developers and enterprises.

The numbers tell the story. In benchmarks shared by Google, Gemini 3.1 Flash-Lite achieved an output speed of 363 tokens per second, outperforming Gemini 2.5 Flash’s 249 tokens per second—a roughly 45 percent improvement. Even more striking, Gemini 3.1 Flash-Lite delivers this performance at a lower cost: $0.25 per million input tokens and $1.50 per million output tokens, compared to Gemini 2.5 Flash’s $0.30 and $2.50, respectively. That combination of higher speed and lower price positions 3.1 Flash-Lite as one of the most cost-efficient large language models currently available for developers.

Beyond the raw numbers, Google emphasizes adaptability. The model introduces “thinking levels,” a feature that allows developers to control the depth of reasoning depending on the complexity of a task. Lighter workloads such as translation or moderation can be handled at lower cost and latency, while more demanding jobs—like simulation generation or UI design—can leverage deeper reasoning capabilities. The feature is available directly in AI Studio and Vertex AI, Google’s developer and enterprise AI platforms.

In early testing, companies such as Latitude, Cartwheel, and Whering have already integrated 3.1 Flash-Lite into production workflows. Testers reportedly found that the model performs with the precision of higher-tier systems, following complex instructions and maintaining contextual understanding across long sessions—all while keeping inference times low enough for real-time use cases.

Gemini 3.1 Flash-Lite’s performance has been validated through independent benchmarking as well. It scored an Elo of 1432 on Arena.ai and posted strong results on reasoning and multimodal benchmarks like GPQA Diamond (86.9%) and MMMU Pro (76.8%). That puts it ahead of comparable models in its class, including several larger Gemini 2.5 variants.

For developers, the implications are straightforward: lower latency, lower cost, and higher scalability. For enterprises, it signals a maturing ecosystem where high-volume AI applications—customer support, content creation, data analysis—can be delivered with smaller infrastructure footprints.

Gemini 3.1 Flash-Lite is now available in preview through the Gemini API in Google AI Studio and Vertex AI. With its blend of speed, reasoning flexibility, and affordability, it represents a clear move toward AI systems optimized not just for intelligence, but for efficiency at scale.