GB300 NVL72 systems built on Blackwell Ultra generate 50 times more tokens per megawatt than the prior generation, which NVIDIA says cuts cost per token by 35 times versus Hopper.
Those gains target always-on inference workloads in “AI factories,” where economics hinge on tokens per second, tokens per watt, utilization and uptime rather than raw chip speed alone.
NVIDIA said its Dynamo framework helps keep long-context, multi-agent inference workloads fully utilized across compute, memory, networking and storage, supporting higher real-time output.
The company framed Blackwell Ultra as part of a broader full-stack push into enterprise AI infrastructure, working with partners including Cisco, Dell, HPE, Lenovo and Supermicro.
NVIDIA also pointed to its next Vera Rubin platform, saying it is designed to raise performance per watt by up to 35 times again as reasoning and agentic AI scale.
With machines set to consume 70% of AI tokens, what is the ultimate purpose of this new intelligence-manufacturing industry?
AI factories promise a $25 trillion token economy. Can our planet's power grids and water supply actually sustain this AI revolution?
NVIDIA touts a $1 trillion demand for AI factories. Is this a sustainable tech revolution or the world's most power-hungry investment bubble?
NVIDIA Blackwell Ultra and Rubin: Redefining AI Token Economics and Enterprise Deployment
Overview
NVIDIA's Blackwell Ultra (B300) is transforming high-performance computing and AI by setting new standards for power and efficiency. Built on an advanced TSMC N4P process node, it delivers 14 petaFLOPS of FP4 Dense performance using NVIDIA's NVFP4 format. The system features HBM3e+ memory with 288 GB capacity and over 10 TB/s bandwidth, while NVLink 6 connectivity enables a 72-GPU system with massive data transfer speeds. These innovations allow Blackwell Ultra to handle demanding AI workloads, making it a key driver for enterprises moving from experimental projects to large-scale AI deployment.