ADVERTISEMENT

OpenAI CTO dodges questions around training data for text-to-video generator Sora

March 16, 2024 10:16 am | Updated 11:04 am IST

OpenAI’s Mira Murati concluded her answer by just saying, “I’m not going to go into the details of the data that was used, but it was publicly available or licensed data” 

The Hindu Bureau

Murati’s reaction has drawn flak on X for her apparent confusion around what publicly available data actually meant [File] | Photo Credit: REUTERS

A video clip from a WSJ interview with OpenAI CTO Mira Murati has gone viral on social media for the wrong reasons. Murati, who sat down earlier in the week with the publication’s Joanna Stern to discuss OpenAI’s new text-to-video tool, Sora, evidently didn’t have a lot of clarity when it came to answering questions about the datasets the tool had been trained on.

When asked what kind of data the company had used in Sora, Murati responded by saying they stuck to “publicly available data and licensed data.”

Stern then went on to specifically ask where this was from. “So, videos on YouTube?”

Murati made a confused expression in response to this, saying she didn’t know.

Sign up for newsletters, unlock features and do more on The Hindu
LOG IN
Support our reporting.
SUBSCRIBE NOW

ADVERTISEMENT

ALSO READ
Why Reddit licensing deal offers Google a data mine to push its luck

Stern persisted with the same line of questioning, asking, “Videos from Facebook, Instagram? What about Shutterstock? I know you guys have a deal with them.”

Murati replied to this saying she wasn’t “actually sure about that” and if they were publicly available, they might have been but she wasn’t “confident about it.”

(For top technology news of the day, subscribe to our tech newsletter Today’s Cache)

She concluded her answer by just saying,” I’m not going to go into the details of the data that was used, but it was publicly available or licensed data.”

Murati’s reaction has drawn flak on X for her apparent confusion around what publicly available data actually meant, her refusal to answer the questions clearly, and possible ignorance.

The source of training datasets in AI tools has become a hotbed for legal muddle. Several authors and media publishers have already filed lawsuits against OpenAI for using their writings to train their AI chatbot ChatGPT without permission.

This is a Premium article available exclusively to our subscribers. To read 250+ such premium articles every month
unlock them all
SUBSCRIBE NOW
If you're already a subscriber
You have exhausted your free article limit.
Please support quality journalism.
SUBSCRIBE NOW
or read this article by Downloading The Hindu News app
If you're already a subscriber
You have exhausted your free article limit.
Please support quality journalism.
SUBSCRIBE NOW
or read this article by Downloading The Hindu News app
If you're already a subscriber
The Hindu operates by its editorial values to provide you quality journalism.
Support our reporting.
SUBSCRIBE NOW
This is your last free article.
to read unlimited content from The Hindu
SUBSCRIBE NOW
Get The Hindu News App on
Get The Hindu News App on

ADVERTISEMENT

Related stories

Europe's world-first AI rules gets final approval from lawmakers. Here's what happens next
OpenAI expands lobbying efforts, hiring former US senator
Digital outlets The Intercept, Raw Story and AlterNet sue OpenAI for unauthorised use of journalism
Tumblr and WordPress user data to be sold to Midjourney and OpenAI: Report

Related Topics

technology (general) / internet / Artificial Intelligence

ADVERTISEMENT

To enjoy additional benefits

Make most of your subscription

Crossword+

CONNECT WITH US