NVIDIA Releases 550B Nemotron 3 Ultra, Cutting AI Agent Costs by Up to 30%

3 articles · Updated · NVIDIA · Jun 4

Nemotron 3 Ultra is a 550B-parameter open Mixture-of-Experts model with 55B active parameters, aimed at long-running AI agents that need stronger reasoning and orchestration across many turns.
NVIDIA said the model delivers up to 5x higher throughput than comparable open models and can lower cost to task completion by as much as 30% by using fewer total tokens and fewer tokens per turn.
Benchmark results showed 91% on PinchBench, 95% on Ruler at 1M context, and SWEBench Verified scores of 65% to 70.4% across multiple agent frameworks, positioning it for coding, research and enterprise workflows.
The release also opens much of the training stack, including 10M new supervised fine-tuning samples, 1M new RL tasks and 15 new RL environments, while moving Nemotron models to the Linux Foundation's OpenMDW-1.1 license.
NVIDIA also launched a 4B Nemotron 3.5 Content Safety model and a multilingual Nemotron 3.5 ASR model with sub-100 ms latency, broadening the lineup for enterprise and voice-native agents.

Sources

NVIDIA10h ago

NVIDIA Releases Nemotron 3 Ultra 550B Model for Faster, Cost-Efficient AI Agent Reasoning

build.nvidia.com10h ago

nemotron-3-ultra-550b-a55b Model by NVIDIA | NVIDIA NIM

YouTube10h ago

Introducing NVIDIA Nemotron 3 Ultra: An Open 550B Model for Long-Running Agents

10 Sources

NVIDIA claims its AI cuts agent costs by 30%, but what are the hidden hardware demands behind this powerful new model?

Is NVIDIA's 'open' AI model a true gift, or a strategy to lock users into its expensive hardware?

As NVIDIA open-sources a powerful AI, will the business models of closed-source labs like OpenAI and Anthropic collapse?

Nemotron 3 Ultra (550B) Unveiled: NVIDIA’s Open-Weight AI Breakthrough for Enterprise, Agents, and the US-China Race

Overview

NVIDIA released Nemotron 3 Ultra in late May or early June 2026, introducing a highly capable and versatile open-weight AI model. With a powerful Mixture-of-Experts architecture, it features 550 billion parameters, 55 billion active parameters, and a 1 million token context window. This design is specifically engineered for advanced reasoning, long-running agentic workflows, and complex planning, making it ideal for demanding applications in coding, research, and enterprise environments. Nemotron 3 Ultra supports multi-GPU deployment, offers full openness with access to weights, data, and recipes, and enables developers to customize and deploy AI solutions across various platforms.

...