Updated
Updated · Tom's Hardware · Jun 13
AI Firms Shift to Chinese LLMs as Model-Switching Cuts Costs by Up to 95%
Updated
Updated · Tom's Hardware · Jun 13

AI Firms Shift to Chinese LLMs as Model-Switching Cuts Costs by Up to 95%

2 articles · Updated · Tom's Hardware · Jun 13

Summary

  • Cheaper Chinese and open-source models are gaining ground as companies retreat from frontier AI services whose token bills have become hard to sustain.
  • SemiAnalysis found top-tier subscriptions are priced far below potential usage costs: Anthropic’s $200 Claude Max 20x could consume about $8,000 in tokens, while OpenAI’s $200 ChatGPT Pro 20x could reach roughly $14,000.
  • Those economics worsen with heavy use: OpenAI’s base plans turn unprofitable above 11.4% utilization and its top tier above 5.7%, while Anthropic’s highest plan hits 0% gross margin at 10% utilization.
  • Companies are responding by routing tasks to cheaper models only when needed; the Wall Street Journal said model switching can cut costs by up to 95%, and Lindy said moving much of its workload to DeepSeek V4 saved millions.
  • That shift is pressuring OpenAI and Anthropic to lower effective pricing, even as the most advanced frontier capabilities may increasingly stay on metered API access rather than flat-rate subscriptions.

Insights

Why are companies paying millions for premium AI when open-source tools can slash costs by up to 95%?
AI firms lose money on subscriptions yet have 70% margins on APIs. Is this a broken model or a brilliant strategy?
AI's productivity is creating 'Dark Output'. How will we measure economic growth when it becomes invisible to traditional metrics?

Chinese LLMs Surpass U.S. in Global Token Usage: Cost, Performance, and Geopolitics Reshape AI Market in 2026

Overview

By early 2026, the global artificial intelligence landscape has changed dramatically, driven by the widespread adoption and cost efficiency of Chinese Large Language Models (LLMs). This shift is seen in the surging global token consumption across leading AI models, with platforms like OpenRouter now integrating over 300 models and processing more than 30 trillion tokens monthly. The immense scale of this expansion highlights intense competition, as developers increasingly prioritize large-scale deployment and usage efficiency alongside raw model capability. As a result, Chinese LLMs have achieved dominance, offering compelling cost-effectiveness without sacrificing performance.

...