Multiverse Computing improves Llama 3.1 8B perplexity with quantum adapters

8 articles · Updated · Quantum Zeitgeist · May 8

The team ran the 8-billion-parameter model on a 156-qubit IBM Quantum System Two processor, adding only 6,000 parameters through Cayley-parameterised unitary adapters.
Researchers said the hybrid approach offers a practical way to boost large language models without unsustainable classical compute growth, and demonstrated end-to-end inference on real gate-based quantum hardware.
Tests on the smaller SmolLM2 model recovered 83% of compression-related performance loss and suggested a path to larger-scale quantum utility despite current hardware noise.

Is this quantum-AI breakthrough a true paradigm shift or an expensive statistical trick?

If quantum AI can now enhance Llama, what real-world problems can it solve next?

Achieving 83% Performance Recovery in Llama 3.1 with Quantum Adapters on Real Hardware

Overview

In May 2026, Multiverse Computing integrated specially designed quantum adapters into the Llama 3.1 8B model on IBM's 156-qubit System Two, achieving a 1.4% reduction in perplexity and recovering most performance lost from prior compression. This success was guided by the discovery of a sharp noise-expressivity phase transition, showing quantum benefits emerge beyond a critical scale despite hardware noise. Complementing this, their CompactifAI technology compresses large language models by up to 90%, enabling faster, cheaper, and more energy-efficient AI that can run on edge devices. Together, these advances address the urgent need to reduce AI's massive energy consumption and point toward a hybrid quantum-classical future.

...

Multiverse Computing improves Llama 3.1 8B perplexity with quantum adapters

Achieving 83% Performance Recovery in Llama 3.1 with Quantum Adapters on Real Hardware

Overview

Related Stories