TT-QuietBox 2 Runs 120B AI Models Locally at 500 Tokens per Second
Updated
Updated · Futura · May 30
TT-QuietBox 2 Runs 120B AI Models Locally at 500 Tokens per Second
1 articles · Updated · Futura · May 30
TT-QuietBox 2 is pitched as a local AI workstation that runs models such as GPT-OSS-120B, Llama 3.1 70B and Mixtral 8x7B on a personal PC without cloud access.
384 GB of total memory—128 GB GDDR6 plus 256 GB DDR5—and four Blackhole processors with 120 Tensix AI accelerators each underpin performance of nearly 500 tokens per second.
Local processing targets the main pain points of cloud AI: rising API costs, latency tied to network quality, and the need to send sensitive data to remote servers.
Businesses can fine-tune models on their own data and keep them in-house, positioning the system as a privacy-focused, sovereign alternative to dependence on major cloud providers.
The pitch also ties on-device AI to a broader shift away from energy-hungry data centers, whose heat impact researchers say can be felt up to 6 miles away.
Can a $10,000 personal supercomputer truly challenge the billion-dollar dominance of cloud-based AI?
This workstation promises AI freedom, but can its software ecosystem escape Nvidia's shadow?
As AI moves from centralized clouds to private desktops, are we prepared for the risks?
Redefining Local AI: Tenstorrent TT-QuietBox 2 Delivers 120B Parameter Model Inference with Open-Source Control and Desktop Simplicity
Overview
Tenstorrent is making a major move into the AI workstation market with the global launch of the TT-QuietBox 2 in Q2 2026. This new device is designed to meet the growing demand for local, open-source AI solutions by allowing large language models and coding agents to run entirely on-device, reducing reliance on cloud infrastructure. With support for models up to 120 billion parameters—an increase from its predecessor—the TT-QuietBox 2 offers a powerful, private, and accessible AI solution. Its strategic positioning aims to redefine on-premises AI capabilities for a wide range of users.