OpenAI rearchitects WebRTC stack for low-latency voice AI at scale
Updated
Updated · OpenAI · May 4
OpenAI rearchitects WebRTC stack for low-latency voice AI at scale
8 articles · Updated · OpenAI · May 4
The redesign supports more than 900 million weekly active users, powering ChatGPT voice, the Realtime API and research projects through a split relay and transceiver architecture.
OpenAI said the system uses a small fixed public UDP footprint, ICE ufrag-based first-packet routing and globally distributed relay ingress to cut setup time, jitter and packet loss.
The company said the approach keeps standard WebRTC behaviour for clients while letting Kubernetes-based infrastructure scale securely without exposing thousands of UDP ports or making backend services act as WebRTC peers.
Is OpenAI’s custom voice architecture a breakthrough or just complex engineering for marginal gains?
With infrastructure lag solved, what now stops AI from matching human conversational speed?
As AI perfectly mimics human voice, who will draw the line between authentic and fake?
OpenAI’s Voice AI Breakthrough: Achieving Ultra-Low Latency of 150–300ms with WebRTC
Overview
OpenAI revolutionized voice AI by replacing slow multi-step pipelines with a single unified model that processes audio end-to-end, achieving ultra-low latency with first text responses in 150-250 ms and speech in 220-400 ms. This breakthrough was enabled by adopting WebRTC for fast, secure streaming and integrating advanced audio processing like voice activity detection and natural turn-taking. Scalable infrastructure using Selective Forwarding Units supports thousands of users and multimodal inputs, powering real-world applications in customer support, telehealth, and education. While the premium API offers unmatched speed and expressiveness, alternatives trade latency for lower cost and flexibility. This innovation is driving rapid market growth and transforming how humans interact with AI voice agents.