Model Routing Cuts LLM Spend by 50% as Companies Tackle Soaring Token Costs

1 articles · Updated · InfoWorld · Jun 30

Model routing is gaining traction as companies try to rein in LLM bills by sending each prompt to the cheapest model that can handle the task.
The approach replaces one-model-per-session workflows with prompt-by-prompt selection, avoiding premium frontier models for simpler requests while using specialized models for tasks like code review.
Coinbase and others are reporting AI spending cut in half even as token usage rises, making routing a cost-control tool rather than a cap on adoption.
Tools such as open-source Claude Code Router already automate that selection across multiple popular models, extending the earlier shift from raw prompting to context-engineering layers.
The next step is prompt preprocessing, where AI rewrites or clarifies requests before routing them, further reducing dependence on any single LLM provider.