The Dictionary Sues OpenAI Over AI Training Data
Summary
Encyclopedia Britannica and dictionary publisher Merriam-Webster have filed a lawsuit against OpenAI in a Manhattan federal court, accusing the company of using nearly 100,000 copyrighted articles and definitions to train its AI models without permission. The publishers claim OpenAI's ChatGPT can reproduce near-verbatim content, diverting traffic and harming their business, while also potentially misleading users about content licensing. The core legal question is whether this training constitutes fair use or copyright infringement, as dictionaries are foundational training sources. OpenAI defends its actions by asserting its models use publicly available data under fair use principles, transforming information rather than copying it directly. This case is part of a larger wave of lawsuits against AI firms by content creators and could significantly reshape how AI companies source training data and redefine fair use in the AI era.
(Source:techputs)