Deepgram, a real-time Voice AI platform, has launched Flux, a conversational speech recognition (CSR) model designed specifically for real-time voice agents. Unlike traditional automatic speech recognition (ASR), which was built for transcription use cases like captions or meeting notes, Flux is trained to understand the nuances of dialogue. It doesn’t just capture what was said. It knows when a speaker has finished, when to respond, and how to keep the flow of conversation natural and engaging.
The global voice AI agents market is projected to reach nearly $47.5 billion by 2034, growing at a compound annual rate of about 34.8%. This growth is primarily due to the enterprise shift toward automated customer self-service, smarter agent assist tools, and embedded conversational experiences across industries. But traditional STT systems weren’t designed to participate in live dialogue. To recreate conversational flow, developers have been forced to piece together transcription, voice activity detection, and turn-taking logic — a patchwork that leads to latency, errors, and frustrating user experiences.
Flux eliminates these problems by embedding turn-taking directly into recognition. It transforms speech recognition from simply transcribing words to modelling the flow of dialogue itself. This provides developers with the tools to build responsive, human-like voice agents without the complexity of workaround code or endless threshold tuning.
“At Vapi, our mission has always been to give engineering teams a platform to build their conversational front door,” said Jordan Dearsley, founder and CEO of Vapi. “Deepgram’s launch of Flux is a perfect example of that vision coming to life. By embedding turn-taking directly into recognition, Flux solves one of the hardest challenges in conversational AI.”