AI chip maker Nvidia is scraping videos from YouTube and other sources for AI training, 404 Media reported. According to documents and chats seen by the media outlet, employees had been asked by the company to scrape videos from Netflix, YouTube and other sources, to build datasets for an AI model for their Omniverse 3D world generator, self-driving car systems and digital human products. The related project, reportedly titled Cosmos, hasn’t yet been released to the public.
The conversations showed that when employees questioned about potential copyright issues arising, they were told that Nvidia was in full compliance with the “spirit of copyright law,” and that they had clearance from the highest levels of the company.
Emails seen by the outlet showed project managers discussing using between 20 to 30 virtual machines in Amazon Web Services to download 80 years-worth of videos every day.
Aside from these, Nvidia was also using a movie trailer database called MovieNet with internal libraries of video game footage and GitHub video datasets and InternVid-10M that contains ten million YouTube video IDs.
(For top technology news of the day, subscribe to our tech newsletter Today’s Cache)
Earlier in April, YouTube CEO Neal Mohan had said that scraping from YouTube to train AI models was a “clear violation” of their terms.
Published - August 06, 2024 10:39 am IST