Updated
Updated · NVIDIA · Jun 4
NVIDIA Releases 550B Nemotron 3 Ultra, Cutting AI Agent Costs by Up to 30%
Updated
Updated · NVIDIA · Jun 4

NVIDIA Releases 550B Nemotron 3 Ultra, Cutting AI Agent Costs by Up to 30%

3 articles · Updated · NVIDIA · Jun 4

Summary

  • Nemotron 3 Ultra is a 550B-parameter open Mixture-of-Experts model with 55B active parameters, aimed at long-running AI agents that need stronger reasoning and orchestration across many turns.
  • NVIDIA said the model delivers up to 5x higher throughput than comparable open models and can lower cost to task completion by as much as 30% by using fewer total tokens and fewer tokens per turn.
  • Benchmark results showed 91% on PinchBench, 95% on Ruler at 1M context, and SWEBench Verified scores of 65% to 70.4% across multiple agent frameworks, positioning it for coding, research and enterprise workflows.
  • The release also opens much of the training stack, including 10M new supervised fine-tuning samples, 1M new RL tasks and 15 new RL environments, while moving Nemotron models to the Linux Foundation's OpenMDW-1.1 license.
  • NVIDIA also launched a 4B Nemotron 3.5 Content Safety model and a multilingual Nemotron 3.5 ASR model with sub-100 ms latency, broadening the lineup for enterprise and voice-native agents.

Insights

NVIDIA claims its AI cuts agent costs by 30%, but what are the hidden hardware demands behind this powerful new model?
Is NVIDIA's 'open' AI model a true gift, or a strategy to lock users into its expensive hardware?
As NVIDIA open-sources a powerful AI, will the business models of closed-source labs like OpenAI and Anthropic collapse?

Nemotron 3 Ultra (550B) Unveiled: NVIDIA’s Open-Weight AI Breakthrough for Enterprise, Agents, and the US-China Race

Overview

NVIDIA released Nemotron 3 Ultra in late May or early June 2026, introducing a highly capable and versatile open-weight AI model. With a powerful Mixture-of-Experts architecture, it features 550 billion parameters, 55 billion active parameters, and a 1 million token context window. This design is specifically engineered for advanced reasoning, long-running agentic workflows, and complex planning, making it ideal for demanding applications in coding, research, and enterprise environments. Nemotron 3 Ultra supports multi-GPU deployment, offers full openness with access to weights, data, and recipes, and enables developers to customize and deploy AI solutions across various platforms.

...