Updated
Updated · InfoWorld · Jun 30
Model Routing Cuts LLM Spend by 50% as Companies Tackle Soaring Token Costs
Updated
Updated · InfoWorld · Jun 30

Model Routing Cuts LLM Spend by 50% as Companies Tackle Soaring Token Costs

1 articles · Updated · InfoWorld · Jun 30

Summary

  • Model routing is gaining traction as companies try to rein in LLM bills by sending each prompt to the cheapest model that can handle the task.
  • The approach replaces one-model-per-session workflows with prompt-by-prompt selection, avoiding premium frontier models for simpler requests while using specialized models for tasks like code review.
  • Coinbase and others are reporting AI spending cut in half even as token usage rises, making routing a cost-control tool rather than a cap on adoption.
  • Tools such as open-source Claude Code Router already automate that selection across multiple popular models, extending the earlier shift from raw prompting to context-engineering layers.
  • The next step is prompt preprocessing, where AI rewrites or clarifies requests before routing them, further reducing dependence on any single LLM provider.

Insights

As companies slash AI spending by routing to cheaper models, is the golden age of frontier AI development already over?
If AI can now choose the best AI for a job, what is the ultimate role left for human engineers?