Updated
Updated · 9to5Google · Jun 12
Google Ranks Gemini 3.5 Flash 6th on Android Bench as Cost Hits $147.1 a Run
Updated
Updated · 9to5Google · Jun 12

Google Ranks Gemini 3.5 Flash 6th on Android Bench as Cost Hits $147.1 a Run

2 articles · Updated · 9to5Google · Jun 12

Summary

  • Gemini 3.5 Flash scored 63.7 on Google’s latest Android Bench, placing 6th and missing the top five in Android coding despite being pitched as a faster, cheaper model.
  • Google’s data showed the model was the most resource-intensive among top contenders, averaging 355.9 tokens and $147.1 per benchmark run with 14.2 latency.
  • Gemini 3.1 Pro Preview outperformed it at 72.4, using 73.3 tokens and costing $47.9, while GPT 5.5 led the ranking with a score of 74.
  • The upper tier has changed little since earlier Android Bench updates, suggesting Google’s newest Flash model improved other tasks more than Android development.

Insights

Why did Google's cheaper AI model cost three times more than its predecessor for the same Android coding tasks?
Is Google's new AI model truly failing, or does its poor score reveal a flaw in how we benchmark coding agents?
As AI agents get powerful enough to delete databases, what new skills must developers learn to manage them safely?