Google Using YouTube Videos to Train Gemini, Veo 3 AI

12 months ago

Google is tapping into one of its most valuable data assets, YouTube’s massive library of videos, to train its most advanced AI models, including Gemini and the new Veo 3 video generator, CNBC is reporting.

YouTube, with more than 500 hours of video uploaded every minute, offers Google an unmatched repository of real-world, human-generated content. However, this use of proprietary content has sparked both admiration for its technical ingenuity and scrutiny over privacy and data ethics.

The company has officially confirmed that it’s using this video trove to enhance the performance of its Gemini AI language model and Veo 3, a cutting-edge AI that generates realistic videos with synchronized audio from text prompts.

At the Cannes Lions advertising festival, YouTube CEO Neal Mohan revealed that Veo 3 will soon be integrated into YouTube Shorts, further embedding AI into the heart of user-generated content. Veo 3, created by Google DeepMind, uses advanced training from video and audio datasets to produce dynamic 8-second clips complete with dialogue, sound effects, and visuals.

Google has long been discreet about how its AI models are trained, but it’s now openly acknowledging that YouTube videos form a crucial part of its AI training pipeline. This means that publicly available content uploaded by millions of users over the years has likely been used to help AI learn to recognize human gestures, environmental cues, spoken language, and video pacing.

While this data advantage gives Google a major edge in developing more nuanced and realistic AI models, it also raises serious concerns around consent, copyright, and how user data is repurposed.

YouTube’s terms of service allow the platform to use uploaded content for product improvement, but users may not have expected their videos to be used in training AI capable of generating synthetic voices, faces, and even events.

While Google promotes Veo 3 as a tool for creativity, marketing, and storytelling, others are concerned about AI misuse and misinformation. Reports from investigative outlets like TIME have shown how Veo 3 can create realistic but fictional videos, including scenes of protests, fake political speeches, or doctored news segments.

Even with digital watermarking tools like SynthID, critics argue that safeguards are still in development and may not be enough to prevent abuse.

Want to see more of our stories on Google?

P.S. Want to keep this site truly independent? Support us by buying us a beer, treating us to a coffee, or shopping through Amazon here. Links in this post are affiliate links, so we earn a tiny commission at no charge to you. Thanks for supporting independent Canadian media!