OpenAI's GPT-4o Training Accused of Using Paywalled Books

A new report alleges that OpenAI’s advanced language model GPT-4o may have been trained using copyrighted content from O’Reilly books without permission. This claim, made by the AI Disclosures Project, raises concerns about copyright infringement and the ethics of AI training data sourcing in the age of artificial intelligence. The accusations center around the use of paywalled materials for the model’s development, sparking a debate on transparency and access to information within the burgeoning AI field. Experts explain that GPT-4o’s performance suggests it possesses a deeper understanding of content found behind paywalls compared to previous models. The study details how they used a technique called DE-COP to assess this knowledge and tested their model using excerpts from O’Reilly books. While OpenAI argues for greater flexibility in data usage, the allegations could lead to stricter regulations on AI training data and copyright law.

Related posts: