Written by PEER DATA
In the rapidly evolving landscape of artificial intelligence, generative tools like large language models (LLMs) and specialized AI systems have sparked intense debates over intellectual property (IP) rights. A striking example emerged in the ongoing New York Times v. OpenAI case, where the newspaper alleges that ChatGPT not only trained on its copyrighted articles but also generated outputs that reproduced substantial portions of them verbatim, potentially substituting for the original content and harming its market. As of December 2025, this lawsuit highlights a critical pivot: while courts have increasingly viewed AI training on copyrighted data as "transformative" fair use and allowing ingestion of vast datasets to derive statistical patterns, the outputs of these systems can cross into infringement territory when they replicate or compete with protected works.
This distinction is particularly acute in the financial sector, where data providers craft proprietary products like indexes, risk models, and sentiment analyses. Unlike the broad fair use wins in cases such as Bartz v. Anthropic, where a California court deemed LLM training on books "quintessentially transformative" and granted summary judgment on fair use for inputs, generative outputs pose unique risks. They can inadvertently or deliberately regurgitate IP-protected elements, undermining the value of curated financial datasets. This essay explores how AI outputs differ from training inputs in IP law, outlines infringement scenarios with a focus on finance, and offers mitigation strategies for data professionals. By contrasting these with permissive training rulings, we underscore that traditional copyright remains a bulwark, but vigilance is key in an AI-driven world.
Generative AI outputs refer to the novel content produced by models like ChatGPT, Claude, or financial-specific tools such as predictive analytics engines. These systems while trained on massive datasets, generate text, images, or insights based on user prompts and can range from market forecasts to risk assessments. However, unlike the training phase, where courts have leaned toward fair use, outputs are scrutinized under copyright's four-factor test from Section 107 of the U.S. Copyright Act: purpose and character of the use, nature of the copyrighted work, amount and substantiality used, and effect on the potential market.
The fourth factor, market impact, often proves decisive for outputs. In Warhol v. Goldsmith (2023), the Supreme Court ruled that even transformative adaptations infringe if they serve a similar commercial purpose and harm the original's market. This precedent echoes in AI cases; for instance, in Getty Images v. Stability AI, ongoing in the U.S. as of December 2025, Getty alleges that Stable Diffusion's image outputs not only replicate copyrighted photos but also dilute its stock image licensing market by generating substitutable visuals. The UK High Court, in a judgment issued on November 4, 2025, dismissed Getty's secondary copyright infringement claims related to AI training (due to insufficient evidence of UK-based model development) but found limited trademark infringement where outputs mimicked Getty watermarks, potentially setting global standards for how outputs can infringe trademarks or pass off as originals.
In finance, this scrutiny intensifies because data products often qualify as copyrighted compilations (original in their selection, arrangement, and analysis) rather than mere facts. A 2025 U.S. Copyright Office report on generative AI clarifies that while training may be fair if non-expressive, outputs reproducing expressive elements (e.g., proprietary financial models) can constitute direct infringement. Contrast this with Kadrey v. Meta and Bartz v. Anthropic, where judges dismissed claims focused solely on training, noting no market harm from the process itself. Yet, as Anthropic's record $1.5 billion settlement in Bartz, preliminarily approved on September 25, 2025, excluded output claims, it leaves the door open for future litigation on generative results. For financial pros, the lesson is clear: outputs shift the IP battleground from inputs to tangible harms, demanding a nuanced application of fair use.
To illustrate the key differences, here's a comparison of fair use considerations for AI inputs vs. outputs:
AI outputs infringe IP when they cross from transformative creation to unauthorized reproduction or substitution, especially in finance where proprietary value lies in insights rather than raw data.
Verbatim or Substantial Regurgitation: In New York Times v. OpenAI, plaintiffs demonstrated ChatGPT outputting near-identical excerpts from articles, arguing this competes with licensed access and harms subscriptions. Applied to finance, imagine an AI tool generating a report that replicates a proprietary sentiment analysis from a provider like S&P Global, pulling key phrases or structures from copyrighted datasets without adding novel value. This could infringe under the "amount and substantiality" factor, as seen in early 2025 rulings where courts rejected fair use for AI outputs mimicking legal research tools like ROSS.
Market Substitution: If an AI produces free alternatives to paid financial products, such as derived ESG indexes or volatility benchmarks, it undercuts licensing revenue. A hypothetical in the financial space: a fintech app uses generative AI to output trading strategies that mirror a copyrighted algorithmic model, effectively substituting for the original without permission. This mirrors concerns in Getty v. Stability AI, where outputs generating images with Getty-like watermarks were deemed potential trademark infringements. In a 2025 mid-year review of AI cases, experts noted that such substitutions amplify market harm, particularly for creative compilations like financial indexes.
Emergent Infringement Through Aggregation: AI combining multiple sources to recreate a proprietary compilation, such as a custom risk assessment from aggregated transaction logs and economic indicators. Without direct copying, this could still infringe if it harms the market, as discussed in the Copyright Office's generative AI report. Real-world parallels include ongoing suits like Warner Bros. v. AI entities, where outputs allegedly replicated creative content, and financial echoes in cases involving proprietary data for AI-driven fraud detection. These scenarios underscore that while training may be fair (as in Bartz, where the court found no infringement from inputs alone), outputs demand case-by-case evaluation to avoid diluting IP in high-stakes sectors like finance.
To navigate these risks, financial data providers must adopt proactive strategies blending legal, technical, and contractual tools. Start with enhanced licensing: Update agreements to include AI-specific clauses restricting output generation that reproduces proprietary elements, while allowing pattern analysis. For instance, specify audit rights to inspect AI outputs for infringement, as seen in evolving standards from the New York Times preservation order, which mandated retaining ChatGPT logs for evidence. This balances innovation, by avoiding blanket bans that could stifle sales, with appropriate protection.
Technologically, implement watermarking and provenance tracking. Tools like PEER DATA's Data Book of Record (DBOR™) platform enable observability to trace data lineage through AI workflows, detecting if outputs mimic protected signals. Industry reports advocate "privacy by design" to mitigate leaks, especially in GDPR/CCPA contexts for sensitive financial data.
Legally, monitor via reverse engineering: Analyze AI outputs for similarities to your IP, leveraging frameworks like the EU AI Act for high-risk financial apps. In the U.S., pursue claims under traditional copyright, as affirmed in 2025 rulings emphasizing business liability for infringing outputs. Collaborate on standards through bodies like the BIS to address bias and infringement in aggregated data.
Generative AI outputs represent a nuanced frontier in IP law, where the transformative leniency afforded to training inputs gives way to stricter scrutiny for market-harming reproductions. In finance, this means protecting proprietary compilations from substitution risks, as evidenced by ongoing cases like Getty and NYT v. OpenAI. Traditional copyright provides a strong defense, but as 2025 developments show, adaptation is essential. Financial data pros should refine licenses, invest in traceability, and stay vigilant to turn AI into an ally rather than a threat; ensuring innovation thrives without eroding IP value.