Google Unveils 6x TurboQuant Memory Cut, Threatening AI Chip Demand
Updated
Updated · CNBC · May 25
Google Unveils 6x TurboQuant Memory Cut, Threatening AI Chip Demand
2 articles · Updated · CNBC · May 25
Google’s March 24 launch of TurboQuant can cut the memory needed to run large language models by six times, raising the risk that demand for high-bandwidth memory chips could weaken.
That threat hit a market built on AI-driven scarcity: Deutsche Bank said TurboQuant triggered a sharp selloff in major memory suppliers, though it is still unclear whether the technique will cause a lasting demand shift.
Samsung, SK Hynix, Micron and SanDisk have surged 114%, 186%, 141% and 156% in 2026, reflecting investor bets that AI has ended the sector’s old boom-bust cycle and will keep prices elevated.
Several investors are warning those assumptions look fragile because supply could expand over the next three years, momentum crowding has intensified, and current valuations imply unusually durable margins.
In South Korea, Samsung and SK Hynix now make up more than 50% of the Kospi, leaving the broader market exposed if AI memory demand cools even as some banks still forecast further gains.
Is the AI memory chip boom a new era of growth or the industry’s biggest bubble yet?
Can Google's new tech single-handedly end the AI memory chip supercycle?
TurboQuant’s 2.5-Bit Compression: Google’s Game-Changer for AI Memory, Market Shock, and the Future of LLMs
Overview
In March 2026, Google introduced TurboQuant, a breakthrough AI memory compression algorithm designed to reduce the size of the key-value cache in Large Language Models (LLMs). The key-value cache acts as a digital cheat sheet, storing important information to avoid redundant computations, which is crucial since LLMs use vectors to represent the meaning of text. TurboQuant optimizes how these similarities are managed, enabling more efficient LLM operations. By allowing large vector indices to be built and queried with much less memory and minimal preprocessing, TurboQuant promises to make powerful AI models run faster and on less hardware.