Source: Reuters

In recent weeks, OpenAI was again in the headlines on mainstream media due to a threatened lawsuit coming from several novelists, including Paul Tremblay, Mona Awad, Christopher Golden and Richard Kadrey, as well as humorist Sarah Silverman, who claimed that ChatGPT could generate summaries of their works, leading them to believe that the artificial intelligence (AI) powered chatbot was trained by using their work.

In this regard, the artists suggested that the works were likely obtained from online book datasets referenced in an OpenAI article published in 2020 to introduce GPT-3, the model that drives ChatGPT. Thus, the plaintiffs argued that the AI system, which has teamed up with the giant Microsoft to apply its technology, extracted data copied from thousands of books without permission, including their own, which violated theirs and several artists’ copyrights.

For this reason, the lawsuit filed in the Northern District Court of San Francisco, seeks to be a class action by proposing that other authors can join and request compensatory damages and definitive injunctions relief to prevent OpenAI from continuing with similar actions.

The OpenAI team has stated on several occasions that it trains its models based on Internet texts, and although it has not revealed exactly what resources it has absorbed, the company has admitted to having trained its systems on hundreds of thousands of copyrighted books stored on websites such as Sci-Hub, Z-Library, Bibliotik, including sources such as Wikipedia and the extensive Project Gutenberg database. Notably, this is the second lawsuit OpenAI has received in just one week; the previous class action accused ChatGPT and Dall-e of violating the privacy of millions of Internet users.

In turn, OpenAI has not commented anything about this case, but this scenario increases the possibility that legal conflicts will become more and more frequent for the company, which once again calls into question the problem of the limits of copyright and the protection of personal data in the age of AI, especially when many authors believe that ChatGPT has taken texts from their books without their permission, and that OpenAI is profiting from their work without any type of retributions.