Google introduced Veo 3, its latest text-to-video and audio generation model, on Tuesday at the Google I/O 2025 developer conference.
According to Google, Veo 3 can produce sound effects, background noises and dialogue synchronized with its video output. The company also claims that Veo 3 delivers improved video quality compared to its predecessor, Veo 2.
“For the first time, we’re emerging from the silent era of video generation,” said Demis Hassabis, CEO of Google DeepMind, during a press briefing. “[You can give Veo 3] a prompt describing characters and an environment and suggest dialogue with a description of how you want it to sound.”
Audio output could become Veo 3’s standout feature if Google fulfills its claims. While AI-powered sound generators and models for video sound effects already exist, Veo 3 distinguishes itself by reading raw pixels from footage and automatically syncing generated audio to visuals, according to Google.
Veo 3 is now available in Google’s Gemini chatbot app for subscribers to the AI Ultra plan, priced at $249.99 per month. Users can prompt it with text or images.
What is Flow?
Flow is Google’s new AI filmmaking tool, tailored for its advanced models — Veo, Imagen and Gemini. It allows storytellers to explore ideas freely and build cinematic clips and scenes.
“It’s early days, and we’re excited to shape the future of Flow with creatives and filmmakers,” the company said.
Flow aims to capture the feeling of effortless, iterative creation, where “time slows down and creation is full of possibility.” Built specifically for Veo, the tool delivers exceptional prompt adherence and cinematic video outputs with realistic physics. Gemini models enhance intuitive prompting, enabling users to describe their vision in natural, everyday language.
Users can import their own assets to craft characters or use Imagen’s text-to-image capabilities to generate new elements within Flow. Once a subject or scene is created, these “ingredients” can be reused across multiple clips and scenes to maintain visual consistency. Alternatively, an existing scene image can serve as the foundation for a new shot.