Google DeepMind’s new AI tool uses video pixels and text prompts to generate soundtracks

by Admin June 18, 2024

June 18, 2024 0 comment

Photo illustration of the shape of a brain on a circuit board.

You Might Be Interested In

Google DeepMind has taken the wraps off of a brand new AI instrument for producing video soundtracks. Along with utilizing a textual content immediate to generate audio, DeepMind’s instrument additionally takes under consideration the contents of the video.

By combining the 2, DeepMind says customers can use the instrument to create scenes with “a drama rating, sensible sound results or dialogue that matches the characters and tone of a video.” You may see a few of the examples posted on DeepMind’s web site — they usually sound fairly good.

For a video of a automobile driving by way of a cyberpunk-esque cityscape, Google used the immediate “automobiles skidding, automobile engine throttling, angelic digital music” to generate audio. You may see how the sounds of skidding match up with the automobile’s motion. One other instance creates an underwater soundscape utilizing the immediate, “jellyfish pulsating underneath water, marine life, ocean.”

Though customers can embody a textual content immediate, DeepMind says it’s elective. Customers additionally don’t must meticulously match up the generated audio with the suitable scenes. Based on DeepMind, the instrument also can generate an “limitless” variety of soundtracks for movies, permitting customers to give you an limitless stream of audio choices.

That would assist it stand out from different AI instruments, just like the sound results generator from ElevenLabs, which makes use of textual content prompts to generate audio. It might additionally make it simpler to pair audio with AI-generated video from instruments like DeepMind’s Veo and Sora (the latter of which plans to finally incorporate audio).

DeepMind says it educated its AI instrument on video, audio, and annotations containing “detailed descriptions of sound and transcripts of spoken dialogue.” This enables the video-to-audio generator to match audio occasions with visible scenes.

The instrument nonetheless has some limitations. For instance, DeepMind is attempting to enhance its means to synchronize lip motion with dialogue, as you’ll be able to see on this video of a claymation household. DeepMind additionally notes that its video-to-audio system relies on video high quality, so something that’s grainy or distorted “can result in a noticeable drop in audio high quality.”

Source link

Google DeepMind’s new AI tool uses video pixels and text prompts to generate soundtracks

Amazon Labor Union votes to join forces with Teamsters

How to Search Filters on Instagram [Easy Guide] (2024)

You may also like

Latest Articles