Chris Stokel-Walker Expands Local AI Setup to 80 Million Tokens a Day

1 articles · Updated · Tom's Hardware · Jun 18

A second mini PC lifted Stokel-Walker’s locally hosted AI workload from 20-50 million tokens a day to about 50-80 million, after his first 96GB system began hitting capacity limits.
The expansion was driven by rising subscription and API costs: he says running the project through GPT-5.4-mini APIs would have cost about $1,500 in two months, roughly three-quarters of his first machine’s price.
The setup runs 24/7 through LM Studio on mostly Qwen 9B models, with some work shifted to 27B and 36B models on the new box; throughput is about 300 tokens per second on prompts and 5-10 on output.
Local models now account for two-thirds or more of his total AI token use, while paid plans from OpenAI and GLM are kept mainly for coding help and troubleshooting.
The move reflects a broader shift among heavy users toward local inference as frontier labs raise prices, tighten rate limits and gate features behind higher-cost tiers.