Updated
Updated · Crypto Briefing · Jun 20
Virtuals Integrates Leyten Engine to Run 744 Billion-Parameter GLM-5.2 Across GPUs
Updated
Updated · Crypto Briefing · Jun 20

Virtuals Integrates Leyten Engine to Run 744 Billion-Parameter GLM-5.2 Across GPUs

1 articles · Updated · Crypto Briefing · Jun 20

Summary

  • Virtuals said the Leyten integration will let its AI agent network run GLM-5.2 by splitting the 744 billion-parameter model across multiple GPUs instead of relying on a single card or centralized cluster.
  • Leyten’s shard engine uses pipeline-parallel inference, distributing pieces of the model over networked GPUs so no single node has to store the full model in memory.
  • GLM-5.2, released publicly by Z.ai on June 16 under an MIT license, uses a mixture-of-experts design with roughly 39 billion to 40 billion active parameters per token and a 1 million-token context window.
  • For Virtuals, which focuses on creating and monetizing onchain AI agents, the setup offers a route to frontier-scale inference in decentralized environments and could lower dependence on major cloud providers.

Insights

Can decentralized AI networks realistically compete with the performance and reliability of big tech's centralized cloud infrastructure?
As AI agents begin managing billions onchain, what is the new frontier for security and risk management?
With AI agents set to control trillions in commerce, who is building the framework to ensure they act ethically?