Updated
Updated · Business Insider · May 15
Google's Gemini 3 Flash Leads Vercel Token Usage as Spend Share Climbs to 21%
Updated
Updated · Business Insider · May 15

Google's Gemini 3 Flash Leads Vercel Token Usage as Spend Share Climbs to 21%

4 articles · Updated · Business Insider · May 15
  • Early April data from Vercel's AI Gateway showed Gemini 3 Flash overtaking Anthropic in token traffic and holding the lead through April, signaling broader customer adoption ahead of Google I/O next week.
  • Gemini Flash gained traction because it is faster and cheaper than full-size models, with Vercel CEO Guillermo Rauch saying enterprise teams favor low-cost, low-latency options and B2C users value its lower hallucination rate.
  • 61% of April model spending on Vercel still went to Anthropic, underscoring that high-volume token usage and revenue leadership diverged as customers used different models for different workloads.
  • Google's spend share rose to 21% from 8% in March as Flash scaled, while OpenAI's share tripled to 12% after its GPT-5.4 and 5.5 launches, highlighting how quickly model rankings can shift.
Is Google Gemini's traffic surge creating real value or fueling the 'AI value illusion' with cheap, low-quality output?
With agentic AI's rise, how can businesses prevent costly errors when 80% lack mature governance models to manage them?

Gemini 3 Flash Surpasses Anthropic on Vercel AI Gateway: How Google’s Model Became the Top Choice by Token Volume in April 2026

Overview

In April 2026, Google's Gemini 3 Flash quickly became the leading AI model by token volume on Vercel's AI Gateway, marking a major shift in the industry. This rise was driven by Gemini 3 Flash's strong performance in latency and cost-per-token benchmarks compared to Anthropic, making it highly efficient and cost-effective. As a result, developers and enterprises now prefer Gemini 3 Flash for high-volume, agentic, and cost-sensitive use cases. Its favorable cost and efficiency have made it the model of choice for applications that need to process large numbers of tokens without high expenses.

...