Apple, Nvidia, Anthropic Using Unauthorized YouTube Videos for AI Training

In a recent investigation, Proof News has uncovered that world’s leading AI companies, including Apple, Nvidia, Anthropic, and Salesforce, have utilized thousands of YouTube videos to train their AI models without permission.

Color YouTube logo

This discovery highlights a significant breach of YouTube’s policies, which strictly prohibit harvesting content from the platform without authorization.

Proof News’ investigation revealed that subtitles from 173,536 YouTube videos, sourced from over 48,000 channels, were incorporated into a dataset named YouTube Subtitles. This dataset was used by prominent tech companies to enhance their AI capabilities.

The content siphoned included transcripts from educational platforms such as Khan Academy, MIT, and Harvard, as well as media outlets like The Wall Street Journal, NPR, and the BBC.

Even popular entertainment shows like “The Late Show With Stephen Colbert,” “Last Week Tonight With John Oliver,” and “Jimmy Kimmel Live” were not spared. Additionally, the investigation highlighted that YouTube megastars like MrBeast, Marques Brownlee, Jacksepticeye, and PewDiePie had their videos used in the training process.

David Pakman, host of “The David Pakman Show,” expressed his frustration upon discovering that nearly 160 of his videos were included in the YouTube Subtitles dataset.

Pakman, whose channel has over two million subscribers and generates significant content daily, emphasized that his work is his livelihood. He argued that if AI companies profit from using his content, he deserves compensation.

Youtube video

Despite the significant findings, representatives from the involved companies, including EleutherAI, the creators of the dataset, have largely remained silent.

Anthropic, which used the Pile dataset containing YouTube Subtitles to train its AI assistant Claude, stated that their use of the dataset complies with public availability standards. Jennifer Martinez, a spokesperson for Anthropic, noted that YouTube’s terms of service apply to direct use of its platform, not to external datasets like the Pile.

Nvidia declined to comment on the issue, and representatives from Apple, Databricks, and Bloomberg did not respond to requests for comment.

Want to see more of our stories on Google?

Add iPhone in Canada as a Preferred Source on Google

P.S. Want to keep this site truly independent? Support us by buying us a beer, treating us to a coffee, or shopping through Amazon here. Links in this post are affiliate links, so we earn a tiny commission at no charge to you. Thanks for supporting independent Canadian media!

Subscribe
Notify of
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x