OpenAI has been fighting several lawsuits in the US from publications over unauthorized use of their content in its models, and its appears that those litigations have also come to India.
Indian news agency ANI has sued ChatGPT-maker OpenAI in a Delhi court for using its content without authorization. ANI alleges that OpenAI had used ANI’s published content to train its models without its explicit permission. ANI also accused OpenAI of attributing fabricated news stories to the agency.
ANI says that it usually licenses its reporting to news organizations like The Financial Times and The Associated Press for a fee. But it alleges that OpenAI used its content for free to train its ChatGPT models. ANI says this is an infringement of its copyright, and harms its business interests. ANI further alleged that some of the content generated by ChatGPT included false claims of the articles having being published by ANI which damanged its credibility. ANI has called for a court order to stop OpenAI from using its material, as well as a resolution to the alleged harm caused by the misattribution of fabricated content.
OpenAI, for its part, has denied the allegations. A spokesperson for OpenAI stated that the company builds its AI models using publicly available data and employs principles of fair use in doing so. OpenAI also emphasised that it has halted the use of ANI’s content for future training of ChatGPT. According to OpenAI, ANI’s content has been part of an internal block list since September, effectively preventing its use in any future AI model training.
But the fact that OpenAI put ANI’s content on a block list in September does suggest that its content might’ve been by the company prior to that time. This could end up being a tricky situation for many companies that have used publicly-available data to train their models. In the US, the New York Times and the Chicago Tribune have similarly sued OpenAI, claiming their copyrights were infringed when these AI models were trained using their articles.
OpenAI and other AI companies however contend that they ingested large parts of the publicly-accessible web to train their models, but the models usually do not repeat it back to their users verbatim — they instead used the ingested data to learn how the world worked, and then use that information in their responses. They say that it’s similar to how a person reads many articles and books, but then produces an original essay based on their learnings.
It’ll be left to the courts to adjudicate how original ChatGPT’s responses are. But the Delhi court — and the other courts around the world — will have to decide whether AI companies training their models on public data constituted copyright infringement. This is a new legal question brought about by the development of new technology, and the results of how the courts view this situation could have a large bearing on the field of AI as a whole.