Artificial Intelligence

OpenAI trying to rope in more news organizations to train AI models

As news outlets team up with AI firms to teach their models using news stories, the amount companies like OpenAI are ready to cough up for copyrighted info is becoming apparent. According to The Information, OpenAI is willing to shell out between $1 million and $5 million annually to license copyrighted news articles for training its AI models. This sheds light on how much AI companies are looking to pony up for licensed content.

This news comes on the heels of a recent report stating that Apple is seeking partnerships with media companies to use their content for AI training, and they’re dangling at least $50 million over a span of several years for the data. The Verge has contacted OpenAI to get their take on these figures.

These numbers are quite in line with some past non-AI licensing deals. Take Meta, for example; when they introduced the Facebook News tab (now no longer available in Europe), they reportedly offered up to $3 million per year to license news stories, headlines, and previews.

Yet, it’s unclear whether the total payments would match up to some of the more significant sums we’ve seen. Consider Google, for example; in 2020, they announced a $1 billion investment for partnerships with news organizations. Moreover, in response to new laws, Google has recently pledged an annual compensation of $100 million to Canadian publishers for linking to their articles.

The enormous language models in use today have predominantly undergone training using data sourced from the internet. While certain AI models maintain secrecy about their training data origins, numerous others disclose specifics about the datasets or web crawlers they employ. The pricing for training datasets varies, contingent on the provider, dataset size, and the content encompassed within.

Certain data providers, such as LAION, are open source and completely free, utilized by models like Stable Diffusion. AI developers frequently deploy web crawlers to gather data from the internet to aid in training their models.

However, this approach is encountering significant hurdles. Firstly, OpenAI’s GPT crawler has been barred from accessing data by certain companies, including The New York Times and Vox Media, the parent company of The Verge. Secondly, several organizations contend that training on their data amounts to copyright infringement.

The New York Times and others have taken legal action against OpenAI and Microsoft, claiming copyright infringement. They argue that ChatGPT and Microsoft’s Copilot can churn out content that closely resembles their work. Teaming up with partners is a way for AI companies to sidestep these problems, and it’s become a more prevalent practice over the past year.

Rohan Sharma

Recent Posts

Best Video Editing Software For PC

Video editing is one of the most in-demand skills in today’s content creation era. If…

8 months ago

Samsung planning to introduce blood glucose monitoring with Galaxy Watch 7

There have been whispers about Samsung's ambition to equip their wearable gadgets with a neat trick:…

8 months ago

TSMC to lock horns with Intel with its A16 chip manufacturing tech

Taiwan Semiconductor Manufacturing Co (TSMC) recently dropped the news that they're gearing up to kick off production…

8 months ago

Is ChatGPT accurate and should we believe what it says?

Modern chatbots like ChatGPT can churn out dozens of words per second, making them incredibly…

8 months ago

Mark Zuckerberg claims Meta is years away from making money through gen AI

The race for generative AI is in full swing, but don't count on it raking…

8 months ago

How JioCinema’s dirt cheap plans can mean trouble for Netflix, Amazon Prime

JioCinema, the famous Indian on-demand video-streaming service, unveiled a new monthly subscription plan, starting at…

8 months ago