Big Tech in ‘underground’ race to license archives that will train Artificial Intelligence

CEO Ted Leonard said he is in talks with multiple tech companies to license Photobucket’s 13 billion photos and videos to be used to train generative AI models that can produce new content in response to text prompts.

April 06, 2024 11:42 am | Updated 11:42 am IST - NEW YORK

Ted Leonard, Chief Executive Officer of Photobucket. File

Ted Leonard, Chief Executive Officer of Photobucket. File | Photo Credit: Reuters

At its peak in the early 2000s, Photobucket was the world’s top image-hosting site. The media backbone for once-hot services such as Myspace and Friendster, it boasted 70 million users and accounted for nearly half of the U.S. online photo market.

Today only two million people still use Photobucket, according to analytics tracker Similarweb. But the generative AI revolution may give it a new lease of life.

CEO Ted Leonard, who runs the 40-strong company out of Edwards, Colorado, said he is in talks with multiple tech companies to license Photobucket’s 13 billion photos and videos to be used to train generative AI models that can produce new content in response to text prompts.

“He has discussed rates of between five cents and $1 dollar per photo and more than $1 per video,” he said, with prices varying widely both by the buyer and the types of imagery sought. “We’ve spoken to companies that have said, ‘we need way more,’ Mr. Leonard added, with one buyer telling him they wanted over a billion videos.

Photobucket declined to identify its prospective buyers, citing commercial confidentiality. The ongoing negotiations, which haven’t been previously reported, suggest the company could be sitting on billions of dollars’ worth of content and give a glimpse into a bustling data market that’s arising in the rush to dominate generative AI technology.

Tech giants such as Google, Meta, and Microsoft-backed OpenAI initially used reams of data scraped from the internet for free to train generative AI models such as ChatGPT that can mimic human creativity. They have said that doing so is both legal and ethical, though they face lawsuits from a string of copyright holders over the practice.

At the same time, these tech companies are also quietly paying for content locked behind paywalls and login screens, giving rise to a hidden trade in everything from chat logs to long forgotten personal photos from faded social media apps.

“There is a rush right now to go for copyright holders that have private collections of stuff that is not available to be scraped,” said Edward Klaris from law firm Klaris Law, which says it’s advising content owners on deals worth tens of millions of dollars apiece to license archives of photos, movies and books for AI training.

Reuters spoke to more than 30 people with knowledge of AI data deals OpenAI, Google, Meta, Microsoft, Apple and Amazon all declined to comment. Many major market research firms say they have not even begun to estimate the size of the opaque AI data market, where companies often don’t disclose agreements. Those researchers who do, such as Business Research Insights, put the market at roughly $2.5 billion now and forecast it could grow close to $30 billion within a decade.

0 / 0
Sign in to unlock member-only benefits!
  • Access 10 free stories every month
  • Save stories to read later
  • Access to comment on every story
  • Sign-up/manage your newsletter subscriptions with a single click
  • Get notified by email for early access to discounts & offers on our products
Sign in

Comments

Comments have to be in English, and in full sentences. They cannot be abusive or personal. Please abide by our community guidelines for posting your comments.

We have migrated to a new commenting platform. If you are already a registered user of The Hindu and logged in, you may continue to engage with our articles. If you do not have an account please register and login to post comments. Users can access their older comments by logging into their accounts on Vuukle.