Google Unveils 26B DiffusionGemma, Claiming 4x Faster Text Generation

3 articles · Updated · Computerworld · Jun 12

DiffusionGemma generates 256-token blocks in parallel instead of one token at a time, which Google says delivers up to 4x faster inference for local, low-latency text workloads.
The experimental open model uses diffusion-style iterative refinement, bidirectional attention and a 26B mixture-of-experts design that activates 3.8B parameters during inference.
18GB VRAM is enough to run a quantized version on high-end consumer GPUs such as Nvidia's RTX 5090, and Google released it under Apache 2.0 on Hugging Face, GitHub and cloud platforms.
Google says the model is aimed at interactive coding, editing and other non-linear tasks, but it concedes returns fade in high-QPS cloud serving and output quality trails standard Gemma 4.

Sources

Computerworld4h ago

Google Unveils DiffusionGemma AI Model for 4x Faster Text Generation

machinebrief.com4h ago

Google's DiffusionGemma: A Leap or Just Hype? | Machine Brief

dev.ua4h ago

Google's new AI model: DiffusionGemma is 4 times faster and transfers image generation techniques to texts | dev.ua

7 Sources

Will hyper-efficient models like DiffusionGemma ease the global GPU shortage, or will new AI capabilities simply accelerate demand for more powerful hardware?

With AI now 'dreaming' text like images, is this the end of sequential language models, or just a niche for specific tasks?

DiffusionGemma: Google’s 4x Faster, 26B-Parameter Diffusion LLM Redefines Local Text Generation

Overview

Google unveiled DiffusionGemma in June 2026, introducing a new large language model that marks a major shift from traditional token-by-token text generation. Built on the Gemma 4 26B Mixture-of-Experts architecture, DiffusionGemma uses a diffusion-based approach instead of autoregressive methods. This allows it to generate text much faster by processing blocks of tokens in parallel, while only activating a fraction of its total parameters during inference. The result is a model that is both efficient and powerful, setting a new direction for how text can be created and used in interactive applications.

...