Commentary

The problems with California’s pending AI copyright legislation

John Villasenor

June 30, 2025

The California legislature is considering AB-412, a bill that would require developers of generative AI models to document and publish information on some categories of copyrighted material used during the training process.
The bill language has ambiguities and could introduce problems for AI startups that might not have the resources to comply with its requirements.
Instead of enacting premature legislation on the use of unlicensed copyrighted content in training AI, states should wait for clarity from the courts as they consider ongoing lawsuits weighing fair use claims.

A general overall view of the California State Capitol building on Dec. 24, 2022, in Sacramento, California. Image of Sport/Sipa USA

8 min read

California’s pending bill, AB-412, is a well-intentioned but problematic approach to addressing artificial intelligence (AI) and copyright currently moving through the state’s legislature. If enacted into law, it would undermine innovation in generative AI (GenAI) not only in California but also nationally, as it would impose onerous requirements on both in-state and out-of-state developers that make GenAI models available in California.

The extraordinary capabilities of GenAI are made possible by the use of extremely large sets of training data that often include copyrighted content. AB-412 arose from the very reasonable concerns that rights owners have in understanding when and how their content is being used for building GenAI models. But the bill imposes a set of unduly burdensome and unworkable obligations on GenAI developers. It also favors large rights owners, which will be better equipped than small rights owners to pursue the litigation contemplated by the bill.

Developer obligations and exposures

AB-412 defines “covered material” as “material registered, preregistered, or indexed with the United States Copyright Office” in accordance with U.S. copyright law. With a few exceptions, the bill would require developers to document any covered material that “the developer knows were used by the developer to train” GenAI models. In addition, the bill would obligate developers to “make reasonable efforts to identify and document any other covered materials that were used” in training models.

Developers would also be required to facilitate generation of digital fingerprints of the covered materials they use and to “make available a mechanism on the developer’s internet website allowing a rights owner to submit a request for information about the developer’s use of covered material.”

If a developer “fails to provide” the mandated information within 30 days after receiving a request, a rights owner can bring a civil action. The potential penalties levied on developers can include $1,000 or more per day, reimbursement of attorney’s fees, “injunctive or declaratory relief,” and “any other relief the court deems appropriate.”

The bill exempts developers from the obligations and legal exposures described above if they develop and use models only for academic or government research, train models only using data they make available for free online, train “exclusively using covered materials for which the developer is the rights owner,” or train without using any covered materials.

Disputes in interpretating bill language

The wording of the bill leaves multiple sources of ambiguity. For instance, what level of effort is necessary for a developer to comply with the requirement to “make reasonable efforts to identify and document” the covered materials they used in training their model? Does “reasonable” depend in part on the financial and staffing capacity of the developer to engage in these investigations? Or should “reasonable” be interpreted without regard to developer capacity?

Furthermore, what constitutes a legally actionable failure to respond to a rights owner? Under AB-412, the information a developer must provide includes a “list of covered materials held by the rights owner that a fingerprint assessment suggests are likely to be present in the developer’s dataset.” Inevitably, developers and rights owners will have different interpretations of the scope of “suggests are likely to be present,” and therefore different definitions of what constitutes a failure to provide the mandated information.

Resolving these differences can involve time-consuming and expensive litigation, requiring not only an interpretation of the statutory language but also a discovery-intensive investigation of whether the developer has disclosed sufficient information to comply with that language.

While well-resourced rights owners and developers have the financial resources and staying power to litigate these questions, smaller developers and rights owners will be disadvantaged. This will be especially acute when there is asymmetry (e.g., a major rights owner suing a small developer, or a small rights owner pursuing a claim against a major developer.)

There will also be disputes over who is a “developer.” A company that develops and offers a GenAI model in California clearly qualifies. But what about a company that develops a GenAI model in a different state and then licenses it to a third party which then makes it available in California? If the third party doesn’t have information about the training data and the company that created the model doesn’t know that it is being used in California, which, if either party, is subject to the bill’s obligations?

Impact on innovation

AB-412 would disproportionately burden AI startups—the very companies often at the forefront of technology innovation. Small companies are much less likely to have the staff and financial resources necessary to comply with the bill’s requirements.

Consider an AI startup that has endeavored in good faith to comply with the bill’s language regarding documentation of materials used and responses to information requests from rights owners, but that nonetheless is sued for alleged noncompliance. The startup may be unable to fund its own defense and could be forced to cease operations long before a yearslong litigation process reaches a final conclusion.

The result will chill GenAI innovation not only in California but also nationally, as the bill applies not only to developers that use GenAI models “commercially in California” but also to developers inside and outside the state that make a model “available to Californians for use.” The language of the bill is capacious enough to also include foreign companies that make models available in California, but as a practical matter U.S. companies will be most impacted. This will drive GenAI innovation overseas—and thereby contribute to undermining U.S. AI leadership.

Unnecessary legislation in light of ongoing fair use litigation

An additional concern is that AB-412 is unnecessary given the many lawsuits moving through the courts over the question of if and under what circumstances the use of copyrighted material to train AI systems might constitute fair use.

If—as seems likely—courts find that at least some forms of GenAI training using copyrighted material are protected under fair use, then the documentation and reporting obligations of AB-412 would be wholly unnecessary and unjustified. Under this outcome, the bill would represent an intrusion by a state into federal copyright law, as it would burden the flexibility that fair use was specifically designed to allow. And it would raise First Amendment issues by restricting uses of information that, under fair use, are not blocked by copyright law.

On the other hand, if courts find that fair use does not protect GenAI training, then the bill is also unnecessary. After all, if training GenAI models using unlicensed copyrighted material is infringement, then rights owners would already have copyright law on their side. Put another way, rights owners would already have the statutory backing they need to seek redress through federal courts. Under this outcome, there is still an important question of how rights owners will be able to determine whether their content was used without authorization. But that would be best addressed at the federal level.

Other issues

If AB-412 becomes law in California, other state legislatures will face pressures to enact similar laws, though the specifics of the statutory language and requirements will vary across states. For a company offering AI models nationally, the aggregate compliance burden would be enormous. This would once again disadvantage tech startups, who would face a potentially crushing regulatory burden.

The upshot is that AB-412 would chill GenAI innovation and favor deep-pocketed companies at the expense of small tech companies and small rights owners. A far better approach is to wait for clarity from the courts on the question of whether training using unlicensed copyrighted material is protected by fair use. At that point, the legislative dialogue could be restarted regarding whether new laws on this topic are needed, and if so, whether there is also a potential role for state-level legislation in light of federal copyright law.

There is also the issue of a provision in pending federal legislation that, if enacted into law, would impose a 10-year moratorium on enforcement of most state AI laws and regulations. The language of the moratorium has been changing. In addition, it has drawn opposition from some Republican lawmakers—which is notable given the narrow majority that Republicans hold on Capitol Hill. As a result, it is too early to know whether and in what form the moratorium will emerge from the federal legislative process. However, if it does become law, it would render AB-412 and many other proposed and enacted state AI laws unenforceable.