Updated
Updated · Tom's Hardware · Jun 18
Chris Stokel-Walker Expands Local AI Setup to 80 Million Tokens a Day
Updated
Updated · Tom's Hardware · Jun 18

Chris Stokel-Walker Expands Local AI Setup to 80 Million Tokens a Day

1 articles · Updated · Tom's Hardware · Jun 18

Summary

  • A second mini PC lifted Stokel-Walker’s locally hosted AI workload from 20-50 million tokens a day to about 50-80 million, after his first 96GB system began hitting capacity limits.
  • The expansion was driven by rising subscription and API costs: he says running the project through GPT-5.4-mini APIs would have cost about $1,500 in two months, roughly three-quarters of his first machine’s price.
  • The setup runs 24/7 through LM Studio on mostly Qwen 9B models, with some work shifted to 27B and 36B models on the new box; throughput is about 300 tokens per second on prompts and 5-10 on output.
  • Local models now account for two-thirds or more of his total AI token use, while paid plans from OpenAI and GLM are kept mainly for coding help and troubleshooting.
  • The move reflects a broader shift among heavy users toward local inference as frontier labs raise prices, tighten rate limits and gate features behind higher-cost tiers.

Insights

As cloud AI costs skyrocket, is a personal PC the key to escaping expensive monthly subscriptions?
Are we witnessing the dawn of AI sovereignty, or is local AI just a niche for tech experts?