BPEA Spring 2024 conference

LIVE

BPEA Spring 2024 conference
Sections

Research

Robotic rulemaking

ChatGPT

As it has rocketed to some 100 million active users in record time, ChatGPT is provoking conversations about the role of artificial intelligence (AI) in drafting written materials such as student exams, news articles, legal pleadings, poems, and more. The chatbot, developed by OpenAI, relies on a large language model (LLM) to respond to user-submitted requests, or “prompts” as they are known. It is an example of generative AI, a technology that upends our understanding of who creates written materials and how they do it, challenging what it means to create, analyze, and express ideas.

Rulemaking by federal agencies is a very text-intensive process, both in terms of writing the rules themselves, which express not only the law but also the agencies’ rationales for their regulatory choices, as well as public comments which arrive almost exclusively in the form of text. How might generative AI intersect with rulemaking? In this essay, we work through some use cases for generative AI in the rulemaking process, for better and for worse, both for the public and federal agencies.

Public comments and generative AI

For the public, generative AI might help people structure their information and views into public comments that have a better chance of influencing agency decisions. While agencies usually permit commenters to send in whatever text they want, more sophisticated comments tend to follow a professional format that contains substantive and sometimes highly technical information. A fairly constant worry with respect to public participation in rulemaking is that special interests overtake diffuse interests due to collective action problems. While readers of this series likely follow rulemaking closely, regulation remains esoteric for most people. Even for those aware of the rulemaking process, figuring out the style and content of a comment might seem out of reach. (Brookings published a helpful guide to commenting, by the way.) Scholars and policymakers disagree about the extent to which the public’s awareness of and participation in rulemaking is a problem that needs to be remedied, but at a minimum a tool that helps interested people compose a persuasive comment could be useful.

Someone could, for example, prompt a generative AI tool to summarize that person’s position on a proposal and knit it into a comment that looks organized and clear. The prompt could be something like: “Write a comment to the Consumer Product Safety Commission telling them that I support their proposed rule on fireworks.” Even better, the prompt could guide the AI to emphasize specific concepts or reasons for the person’s views: “Write a comment to the Consumer Product Safety Commission telling them that I support their proposed rule on fireworks because fireworks can be traumatic for little kids and pets.” In our experience, ChatGPT can readily create convincing public submissions based on such straightforward prompts. For one Department of Labor proposal, simply requesting that the chatbot produce several paragraphs objecting to the rule resulted in text comparable to a mass comment campaign submission, and the content was quickly inverted by asking for a supportive comment.

Generative AI also takes the possibility of “mass” and “malattributed” comments to the next level. Mass comments are “identical and near-duplicate comments” that are often “sponsored by organizations and submitted by group members and supporters to government agencies in response to proposed rules.” A team of researchers from The George Washington University and the Israel Democracy Institute wrote about the political reasons why groups organize these campaigns, likening them to the kind of lobbying activity that happens in Congress and other venues. Whether these mass comment campaigns actually influence agencies is the subject of some debate. The laws governing regulatory decisions generally do not call upon, or allow, the agencies to factor in public opinion; rather, agencies seek substantive and technical information from public comments. And the number of comments received is not a reliable proxy for general public opinion anyway because such submissions are not made by a representative sample of the population—even setting aside the possibility that some of the comments were not sent by real people, a possibility that generative AI increases. So most observers (with Professor Nina Mendelson a notable exception) have been dismissive of the role of mass comment campaigns in agency rulemaking decisions. Yet mass comment campaigns persist, for reasons that political scientists like Devin Judge-Lord are exploring in ongoing research. The reality is that generative AI arrives at a time when mass comment campaigns are a regular, if not frequent, component of rulemaking, so we can expect the two to intersect.

Combining generative AI with mass comment campaigns could lead to mass comments that look less duplicative in nature, as the same idea could be expressed in many different ways with some support from an AI tool. Agencies currently have access to language processing tools that allow them to group comments based on the similarity of their text. This helps agencies meet their burden under the law to consider and respond to all significant comments. More varied comments will strain the current set of tools and likely lead to increased agency resources dedicated to comment analysis. That could further slow the already cumbersome rulemaking process, as agencies figure out how to cope with large and overwhelming volumes of differentiated and ostensibly substantive comments. For advocates looking to gum up the works, this could be an appealing tactic.

In response, and following the approach they took in response to mass comment campaigns, agencies might be tempted to spend resources to develop tools to help identify which comments were generated by an AI rather than a human. Such tools are already in production for other purposes. While these tools are not “fully reliable,” they could alert agency staff that a comment they’re reviewing is likely generated by AI. It is not immediately clear how an agency would make use of such information, however, because the Administrative Procedure Act only requires that the commenter is a “person,” and a person could have submitted the comment, no matter who or what drafted it. Perhaps the alert could encourage agency staff to read the comment with some skepticism, but it’s not obvious at this point that such an approach would be reasoned or fair.

Generative AI can be viewed as part of an ongoing tit-for-tat for public participation, with commenters deploying more sophisticated commenting methods and agencies attempting to respond with their own technology. Such an arms race is a waste of resources, though, if the end result is a large body of comments that neither represent the views of the general public nor offer novel and reliable information to the agency. More comments do not necessarily lead to better regulatory choices. The “arms race” frame is also troubling as applied to public input in a process intended to welcome it. Whether the type of participation that generative AI facilitates is the right kind of participation is part of what makes it such a provocative development.

If generative AI adds to the richness of mass comments, that could be an improvement over many mass comment campaigns which tend to express up-or-down sentiment. Personal stories woven into comments can sometimes shed light on problems that agencies did not anticipate—the question is whether generative AI is poised to actually elucidate such richness or simply fake it. If regulators end up altering rules because of convincing but made-up “facts,” that would certainly be a step backward. In our experience, agency staff work hard to substantiate the information provided to them by public comments rather than accepting them at face value, but it is not implausible to imagine such safeguards breaking down. In that case, the potential for review in the courts offers an important backstop.

Generative AI arrives at a time when mass comment campaigns are a regular, if not frequent, component of rulemaking, so we can expect the two to intersect.

Taking this analysis one step further, Professor Michael Herz coined the term “malattributed comments” after the spectacle that accompanied the Federal Communication Commission’s (FCC) Net Neutrality rulemaking, in which millions of public comments claimed to be from people who either did not exist or who did not actually send comments. In a study commissioned by the Administrative Conference of the United States (ACUS), researchers (including one of us) concluded that the risks of an agency being misled by malattributed comments are lower than might be expected because of the way agencies evaluate comments. Generative AI disturbs this equilibrium because it may help bad actors generate comments that look more persuasive, i.e., comments that seemingly present evidence beyond mere sentiment.

By reducing the costs of producing “malattributed” comments, generative AI could lead to a pooling equilibrium—to borrow a concept from game theory that is often applied to insurance markets—where agencies can no longer meaningfully distinguish between valid and malicious comments. Agencies could then be inclined to assume all comments might be “fake” and discount their relevance, weakening public commenting as an avenue for meaningful public input and the formulation of improved policies and, ultimately, making people worse off. That need not come to pass, however. The Administrative Procedure Act does not permit agencies to entirely dismiss all public comments in this manner, nor does it categorically prohibit the public from using AI to aid in comment creation. Agencies also have a track record of collaborating to address novel issues, such as mass comment campaigns, via the eRulemaking program, and we expect that work to continue. Overall, as commenters reach for ways to use generative AI, agencies would be wise to lean on this existing governance structure as they consider potential responses.

Agency workflows and generative AI

Generative AI also offers some promises and perils for internal agency processes. Beyond malattributed comments, one worry is that flooding rulemaking dockets with a virtually unlimited supply of unique comments would incapacitate government systems and prevent other users from submitting public input. While these fears are alarming, the rulemaking system is fairly robust to a torrent of bot-generated comments. More specifically, Regulations.gov—the site that a majority of agencies use to accept public submissions on rules—already implements several techniques to manage large volumes of comments.

First, Regulations.gov employs a CAPTCHA system developed by Google to distinguish between humans and bots. This prevents a computer program from automating comment submissions through the web interface, meaning that bad actors who want to spam the system would need to do so manually. In fact, one paper warning against the risk of bot submissions to rulemakings suggested this very solution. Second, the Regulations.gov Application Programming Interface (API) provides a way for organizations to submit multiple comments in an automated fashion—within certain limits. The API, which is managed by the General Services Administration (GSA), requires adherence to terms of service and uses a key to authenticate post submissions. Accordingly, GSA can also revoke access when it detects malicious activity. Further, submissions are constrained by a limit of 50 per minute or 500 per hour (whichever is reached first). This throttling inhibits malicious users from overwhelming the system before being identified and could be made stricter if necessary. While these safeguards are not foolproof, they provide meaningful protections against incapacitating the comment system with AI-generated text.

Another worry is that the government could be misled by AI-generated comments. The text from generative AI can be very convincing, even if it is entirely untrue. This is because LLMs draw from voluminous writings by humans. They are essentially extremely big text prediction models that select the next word, phrase, or punctuation according to sample texts that they have been trained on (i.e., the texts fed to the model as examples for it to learn from); they are not a lookup table or encyclopedia. ChatGPT is not connected to the internet and its training data stop in 2021, but it can still generate plausible analyses of current articles based on information gleaned from prompts and its training data. Even LLMs connected to the internet, such as Bing AI, have a tendency to make up or “hallucinate” information, especially in contexts lacking training data. Relatedly, they offer more tailored answers in areas where they have received more training. In this context, public comments on rules are available on Regulations.gov going back many years, making them a rich source of training data.

Given the likely pace of development for generative AI, the federal government needs to be prepared to adapt to this intriguing new set of tools.

Given this backdrop, someone could prompt a generative AI tool to write a comment that supports or criticizes a rule based on fake scientific data or other technical information. Interestingly, the public can already submit fake information to an agency. It is currently the agency’s responsibility to wade through public comments and discern which information is or is not reliable; this is part of why it can take agencies months or years to finalize rules once they have been proposed. While the rulemaking process can be criticized for its length, taking time to discern the weight of public comments helps protect against challenges to the rulemaking process’ integrity, whether they come from generative AI or other sources. The possibility raised above of a large number of authentic, substantive, and varied comments does give us some pause because such comments would not violate the terms of service and could therefore spike agency workloads. It may be a challenge to balance the policy goal of a notice-and-comment process that is open to all with the reality of limited agency resources to consider so much information. This could be an area ripe for enhancing executive branch and congressional oversight of agency rulemaking.

Generative AI could also help agency staff summarize and respond to comments received on the rules. A strength of LLMs is their ability to process and compose information based on their training data; regulators could use this to their advantage, especially if agencies had access to a model trained on public comments or texts related to the content of a rulemaking. Further, LLMs are most useful when combined with expertise because the information produced by the AI can be verified and supplemented by those with subject matter or “topic” knowledge. For example, an LLM could help regulators summarize comments on the proposal, classify feedback based on predefined categories, and cluster information based on similarities in content, style, or other features. Then, agency staff could provide a rough outline of responses to comments and prompt a generative AI to format them in the style of a rulemaking published in the Federal Register. This workflow would incorporate a more sophisticated set of tools than what agencies currently use to analyze and respond to public comments. While we are not in a position to say whether the federal government will actually invest in generative AI technologies, the capability is there. One pathway would be to fine-tune existing models like OpenAI’s GPT 3.5 (the basis for ChatGPT) for the rulemaking context. This could entail an agency customizing an existing LLM to better apply to its rulemaking activity by conducting additional training with relevant texts. For instance, the Environmental Protection Agency (EPA) could fine-tune GPT 3.5 for use by its Office of Air and Radiation by feeding it examples of its responses to comments from prior rules and topic-specific materials on the Clean Air Act.

Using AI to support agency workflows, including analyzing public comments, sits in tension with current notions of who is supposed to do the “thinking work” of the government. One of us has written about this recently (with co-author Professor Rachel A. Potter) in the context of regulations that are drafted by government contractors, exploring whether drafting rules should be considered an “inherently governmental function” that is off-limits to contractors. While contractors can serve as vital supplements to agency capacity or expertise, overreliance on contractors can introduce conflicts of interest and other risks into the process. Generative AI offers an interesting twist on this concern. Might generative AI be more conflicted or biased than an outside contractor, less conflicted or biased, or might it simply present entirely different considerations? Because an LLM’s training data inform the way it generates text, using it to draft regulatory material could therefore reinforce the status quo in some circumstances and, in others, help create new connections in human knowledge. Of course, existing approaches to crafting rules are not without their own biases, nor are other tools (e.g., Google searches) that have become commonplace in policymaking. We are only at the beginning of working through these issues as they apply to this essential form of executive branch lawmaking.

We limited this essay to generative AI, one of many technologies that intersect with rulemaking comments. Other tools could, for example, help alert people to rules that interest them and help the government catch errors and omissions in their analyses. The technology of rulemaking evolves along with the rest of society, and regulators should consider how to take advantage of the upside of tools like generative AI while minimizing their risks. One thing is sure: Given the likely pace of development for generative AI, the federal government needs to be prepared to adapt to this intriguing new set of tools.

Authors

  • Footnotes
    1. There are also personal risks for people whose names are used on such comments. The ACUS report works through some of these risks, too.
    2. To provide an example, when Professor Steve Balla asked ChatGPT to generate his biography, the chatbot mistakenly identified him as a professor at American University and credited him with authoring a book he did not write. In this case, the training data must have had enough information to correctly identify his field, but “predicted” he worked at another DC-based university.
    3. Some generative AI tools might refuse to comply with this kind of prompt. OpenAI integrated ChatGPT with a moderation tool to prevent it from producing prohibited content (e.g., sexual, hateful, violent, or promoting self-harm). Such guardrails are not guaranteed, however.
    4. This might be unlawful. Citing 18 U.S.C. § 1001, Regulations.gov warns that “[i]t is a violation of federal law to knowingly and willfully make a materially false, fictitious, or fraudulent statement or representation . . . including through comments submitted on Regulations.gov.” But we know of no instance of anyone being prosecuted for submitting false comments.
    5. The underlying GPT 3.5 models are not open source, but there is a publicly accessible API for using it. If an agency customizes an LLM – e.g., by fine-tuning an OpenAI GPT 3.5 model – then it will know the additional training materials used (and should make those public).