Google AI breakthrough means chatbots use six times less memory during conversations without comprom

Google researchers have developed a compression algorithm called TurboQuant that reduces the memory required for chatbot conversations by six times while maintaining identical performance. The technique converts data stored in an AI's working memory into a more compact, efficient format.

This breakthrough addresses a significant bottleneck in deploying large language models. Current chatbots consume substantial memory during conversations, which limits how many users can access them simultaneously and drives up computational costs. By shrinking memory requirements without sacrificing quality, TurboQuant enables faster, cheaper deployments at greater scale.

The algorithm works by compressing the numerical data that AI models process during conversations. Rather than storing full-precision information, TurboQuant quantizes this data into smaller chunks. Previous compression methods typically degraded response quality. TurboQuant avoids this trade-off through intelligent optimization of how data gets compressed and decompressed.

The technology has immediate practical applications. Cloud providers can serve more simultaneous users per server. Organizations with limited computational resources gain access to powerful AI tools. Latency decreases for end users waiting for responses.

Google's next steps involve integrating TurboQuant into production systems and testing its performance across diverse chatbot applications. Researchers will also explore whether the technique extends to other memory-intensive AI tasks beyond conversation models.

Google AI breakthrough means chatbots use six times less memory during conversations without comprom

Read an extract from Luminous by Silvia Park

New this month, this Lego Star Wars Yoda bust would terrify Darth Vader and I have to have it

New data center will be partially powered by human brain cells for the first time

Get Daily ScienceWireDaily