Virtuals Integrates Leyten Engine to Run 744 Billion-Parameter GLM-5.2 Across GPUs
Updated
Updated · Crypto Briefing · Jun 20
Virtuals Integrates Leyten Engine to Run 744 Billion-Parameter GLM-5.2 Across GPUs
1 articles · Updated · Crypto Briefing · Jun 20
Summary
Virtuals said the Leyten integration will let its AI agent network run GLM-5.2 by splitting the 744 billion-parameter model across multiple GPUs instead of relying on a single card or centralized cluster.
Leyten’s shard engine uses pipeline-parallel inference, distributing pieces of the model over networked GPUs so no single node has to store the full model in memory.
GLM-5.2, released publicly by Z.ai on June 16 under an MIT license, uses a mixture-of-experts design with roughly 39 billion to 40 billion active parameters per token and a 1 million-token context window.
For Virtuals, which focuses on creating and monetizing onchain AI agents, the setup offers a route to frontier-scale inference in decentralized environments and could lower dependence on major cloud providers.