Google Ranks Gemini 3.5 Flash 6th on Android Bench as Cost Hits $147.1 a Run

Q: Why did Google's cheaper AI model cost three times more than its predecessor for the same Android coding tasks?

It cost more because the real driver was not the advertised per-model price, but how many tokens Gemini 3.5 Flash consumed to finish the Android Bench tasks. In Google’s own benchmark, Gemini 3.5 Flash used an average of 355.9 tokens per run versus 73.3 for Gemini 3.1 Pro Preview, while also scoring worse. That much heavier token use pushed the total benchmark cost to about $147.1 per run, roughly three times the older model’s cost. The results suggest Gemini 3.5 Flash was less efficient for this specific workload. Android Bench measures real Android coding cases using Kotlin, Jetpack Compose, Coroutines, Room, Hilt and system APIs, so a model that needs more attempts, more verbose outputs, or more internal reasoning to solve the same issue will become more expensive even if it is marketed as “cheaper” in general. The broader evidence around agentic coding helps explain this. Newer coding models often perform more planning, iteration, tool use, and self-correction inside a task. Those extra loops can sharply raise token counts and latency. In coding, output tokens and repeated context also add cost quickly, especially if the model is not well matched to the harness or domain. So the gap was not a pricing contradiction so much as a task-efficiency problem: Gemini 3.5 Flash may be cheaper or stronger on other benchmarks, but on Android coding it used far more tokens, took longer, and delivered a lower success rate, making it substantially more expensive per completed task.

2 articles · Updated · 9to5Google · Jun 12

Gemini 3.5 Flash scored 63.7 on Google’s latest Android Bench, placing 6th and missing the top five in Android coding despite being pitched as a faster, cheaper model.
Google’s data showed the model was the most resource-intensive among top contenders, averaging 355.9 tokens and $147.1 per benchmark run with 14.2 latency.
Gemini 3.1 Pro Preview outperformed it at 72.4, using 73.3 tokens and costing $47.9, while GPT 5.5 led the ranking with a score of 74.
The upper tier has changed little since earlier Android Bench updates, suggesting Google’s newest Flash model improved other tasks more than Android development.