NVIDIA Releases Nemotron 3 Model That Is As Capable As GPT-oss, Qwen, But Much Faster

NVIDIA creates much of the hardware that allows for the creation of AI models, but it’s now creating some very capable models of its own.

The chip giant today announced the release of Nemotron 3, a new family of open models designed specifically for agentic AI applications. The launch begins with Nemotron 3 Nano, a 30B-parameter model that delivers competitive accuracy while running 1.5 to 3.3 times faster than comparable open-source alternatives. NVIDIA has also released the model weights, training recipe, and all the data for which we hold redistribution rights, making this one of the most open models around.

Pushing the Efficiency Frontier

Nemotron 3 Nano stands out for its hybrid architecture approach. The model combines Mamba state space models with Transformer mixture of experts, activating just 3.2 billion parameters during inference despite having 31.6 billion total parameters. This design allows it to match or exceed the performance of larger models like GPT-OSS-20B and Qwen3-30B-A3B-Thinking-2507 while delivering substantially higher throughput.

According to NVIDIA’s benchmarks, Nemotron 3 Nano achieves 89.1% accuracy on AIME25 math problems and 67.7% on general chat tasks. More importantly for production deployments, it processes 3.3 times more tokens per second than Qwen3-30B-A3B and 2.2 times faster than GPT-OSS-20B when handling 8K input and 16K output sequences on a single H200 GPU.

The model also supports context lengths up to 1 million tokens and outperforms both GPT-OSS-20B and Qwen3-30B-A3B-Instruct-2507 on long-context understanding tasks.

A Roadmap for Larger Models

NVIDIA has outlined plans for two additional models in the Nemotron 3 family, both expected in the coming months. Nemotron 3 Super will be approximately four times larger than Nano, optimized for collaborative agent workflows and high-volume tasks like IT ticket automation. Nemotron 3 Ultra, sixteen times larger than Nano, aims to provide state-of-the-art reasoning performance.

These upcoming models will incorporate several advanced technologies. LatentMoE, a novel hardware-aware expert design, allows the models to utilize four times more experts at the same inference cost. Multi-token prediction layers improve both generation efficiency and model quality for long-form text. The models are also trained using NVFP4, NVIDIA’s new low-precision format.

Open Development Approach

In a notable move toward transparency, NVIDIA is releasing not just the Nemotron 3 Nano model weights, but also the training recipe and all redistribution-permitted data used to develop it. This level of openness is relatively uncommon among major AI companies and positions Nemotron 3 as what NVIDIA claims will be “one of the most openly developed model families in the industry.”

The models employ multi-environment reinforcement learning during post-training, exposing them to diverse scenarios to improve accuracy across different tasks. They also support granular reasoning budget control at inference time, allowing developers to adjust computational effort based on specific requirements.

Strategic Positioning

With agentic AI applications gaining traction across enterprise use cases, the efficiency gains offered by Nemotron 3’s architecture could prove compelling for organizations looking to deploy AI agents at scale without proportionally scaling infrastructure costs. The release also represents NVIDIA’s evolution from pure hardware provider to a more integrated AI platform company — by developing models optimized for its own chips, NVIDIA can demonstrate the full capabilities of its hardware while providing reference implementations for customers building their own AI systems.

Posted in AI