January 5, 2026 · Negotiate The Future

Training Data, Copyright, and the IP Commons

Resolving the legal ambiguity at the heart of foundation model development

Foundation models are trained on human creative work at a scale that has no legal precedent. Books, articles, code, images, music — ingested by the billions, used to build systems that can replicate stylistic and technical patterns from that work. The courts are catching up. Congress has not started.

The fair use doctrine was designed for transformative use by humans — parody, commentary, scholarship. Whether training a commercial AI system on copyrighted works without license constitutes fair use is genuinely unsettled. Pending litigation will produce some clarity, but case-by-case resolution is the worst possible way to establish rules for an industry at this scale.

A legislative framework should distinguish between use for training (where some compensation or licensing mechanism may be appropriate) and output generation (where output rights depend on the degree of direct reproduction). It should create safe harbors for non-commercial research while requiring commercial developers to engage with rights-holder markets.

The goal is not to stop foundation model development. It is to ensure that the economic value extracted from human creative work flows back, in some form, to the ecosystem that produced it. A healthy creative commons and a healthy AI industry are not in fundamental conflict — but that outcome requires deliberate policy, not litigation roulette.

← Back to News