Sections

Research

Experimenting with experimentation: 4 model bills for tech policy trials

J. Scott Babwah Brennen and
Scott Babwah Brennan
J. Scott Babwah Brennen Head of Online Expression Policy, Center on Technology Policy - University of North Carolina-Chapel Hill
Matt Perault
PERAULT
Matt Perault Director, Center on Technology Policy - UNC-Chapel Hill

January 30, 2024


  • Reducing barriers to legislating may increase the pace and relevance of reform, helping policy to keep pace with technological change.
  • Policy product trial (PPT) programs may be particularly effective in testing new regulation for emerging technologies, such as generative AI models, because PPT embraces the idea that regulators and technology companies can learn more about both technology policy and technology products by trialing them together.
  • Regulators and industry should adopt a position of curiosity, using experiments to learn more about the interplay between tech policy and tech products.
Compliance theme with blurred city abstract lights background
Credit: TierneyMJ/Shutterstock
Introduction

New federal laws regulating the tech sector in the United States are like sasquatches. They are often discussed and often mythologized. They are sometimes celebrated and sometimes feared. But rarely are they seen in the wild.

Members of Congress have introduced dozens of proposals to reform the sector in areas like privacy, antitrust, content moderation, and child safety. Yet despite the steady stream of new proposals, new laws governing new technologies remain largely in the realm of fiction. Several barriers have made it difficult to impose new regulations on the tech sector: the Senate filibuster; the Constitution, including the First Amendment; advocacy campaigns by the opponents of reform; and political divisions.

Some of these hurdles are nearly impossible to overcome. In a recent hearing on artificial intelligence, Senator Richard Durbin (D-Ill) highlighted the structural impediments to reform: “When you look at the record of Congress in dealing with innovation technology and rapid change, we’re not designed for that. In fact, the Senate was not created for that purpose, but just the opposite.”

But there are two related barriers to new regulation that legislators could realistically address: uncertainty and permanence. The uncertainty about the impact of reform makes lawmakers hesitant to pass new laws. If an opponent argues that proposed legislation will disrupt users’ experience of technology, introduce cybersecurity risks, or undermine privacy, legislators may struggle to identify contrary evidence that will give them comfort in moving ahead with reform. For example, companies and trade associations argued that the suite of antitrust reform proposals introduced in the House and Senate in the 117th Congress would inhibit American competitiveness, a claim that was difficult to refute.

The permanence of reform also presents a challenge. Because it is difficult to pass reforms, if a lawmaker gets it wrong, the consequences can be severe for the online ecosystem. For instance, recent legislation that reforms Section 230 has come under heavy criticism, but the law remains on the books with little chance of being amended or scrapped.

These two factors combine to increase the stakes of potential regulation. The uncertainty surrounding new regulation means that success is far from guaranteed, and if a problematic law passes, the costs will endure. Lawmakers don’t want to be on the hook if they make a mistake that lasts years.

Legislators can address these two barriers through a shift in their theory of policymaking. Proposals for technology policy reform have taken the form of fixed laws of infinite duration, which run the risk of unintended consequences. But lawmakers could take a more curious, experimental approach. This alternative approach to policymaking—which we call “regulatory curiosity”—would enable lawmakers to test different regulatory approaches for fixed periods of time and to gather data that will help them understand the costs and benefits of those different models. As they learn, they can improve policy.

Recently, two experimental tools have gained prominence: policy experiments and regulatory sandboxes. Policy experiments involve testing new regulations in timebound, controlled settings. Sandboxes test new products while relaxing certain regulations.

Each of these models has merits, but they have also run into challenges. Critics argue that sandboxes relax essential consumer protections and create a regulatory “race to the bottom.” While policy experiments “aspir[e] to relevance”—that is, to offer realistic and scalable insights to pressing problems—they don’t always achieve this goal. Both policy experiments and sandboxes have sometimes failed to attract industry participation. Sandboxes have also become a partisan issue in the United States, with Republicans typically supporting them and Democrats often in opposition.

Alongside these two existing options for experimentation, we propose to add a new tool to the experimentation toolkit that combines the strongest features of policy experiments and regulatory sandboxes. Our policy product trial (PPT) model starts from the assumption that new regulations and products can and should be trialed together.

While policy experiments test new policies on existing products, and sandboxes test new products by relaxing existing policies, the PPT model would involve a government-run test of both new products and new policies simultaneously. A PPT experiment is short and time limited, involves a trial of new products, and provides short-term regulatory relief where necessary. It encourages communication between companies and regulators, along with data sharing, so that both industry and government can monitor the impact of the new products and the new policies. Companies must commit to share data as part of their participation.

Experiments run under this model would aim to create new regulation in dialogue with new product development, and to maintain open lines of communication between regulators and companies. The goal is to make products more responsive to regulatory concerns and regulation more responsive to product considerations. Unlike sandboxes, which were originally designed as a way of using regulatory relief to encourage companies to innovate with their products, PPT experiments are designed to encourage both companies and policymakers to innovate—not just with products, but with public policy as well.

Embracing this more curious approach to policymaking will allow policymakers to develop smarter regulation that better protects consumers, facilitates competition, and endures as technologies evolve. Experimental policy programs designed along these lines may be more likely to gain support from tech companies, who ideally would see the participation incentives as a reason to support regulation. Reducing barriers to legislating may increase the pace and relevance of reform, helping policy to keep pace with technological change.

In generating data as to the effectiveness of the policy measures tested, such experiments will also help governments gain empirical support for implementing these policies—which could prove helpful not only at the policymaking stage, but also in court. In recent cases, such as district court decisions enjoining online child safety laws in California and Arkansas, judges have been skeptical about government claims that legislation is “narrowly tailored” or “substantially related” to a government interest. In the California case, for instance, Judge Beth Labson Freeman repeatedly stated in her order granting a preliminary injunction against the law that the government had failed to produce evidence sufficient to establish the linkage between the specific provisions of the law and “the State’s substantial interest in protecting the physical, mental, and emotional health and well-being of minors.”

If the states were able to provide data on the efficacy of certain policy provisions for meeting certain government objectives, they might be more likely to meet this constitutional bar. Experiments are one way to produce this data. For this reason, experiments might not only contribute to the passage of more and better legislation but could also increase the chances that passed legislation will survive judicial review.

Experimenting with experimentation will provide a valuable alternative to the status quo. The United States once led the world in setting the regulatory agenda in tech policy, but it no longer does. Europe, the United Kingdom, Australia, and China are often cited as the leaders of regulation in the tech sector. America is lagging, and without a fundamental change in approach, it will fall further and further behind.

In this essay, we first outline two existing forms of experimentation: policy experiments and regulatory sandboxes. We describe their potential benefits, as well as their potential weaknesses. We then describe our new hybrid model: the PPT model.

Finally, we examine how this type of experimentation might work in practice. For this theory to become reality, state and federal lawmakers must develop specific proposals for implementing it. To that end, we provide a more detailed description of potential experimental approaches to public policy in three areas: two regulatory sandboxes in content moderation (one on online violence and one on user choice), a policy experiment in age verification, and a PPT experiment in generative artificial intelligence.

In addition, in an article on Lawfare, we provide a complete draft model bill for each program. Our hope is by providing a menu of experimentation options, it will be easier for curious lawmakers to make this model a reality.

Background: 3 tools for experimenting in public policy

Policy Experiments

As commonly used, the term “policy experiment” has come to refer both to a research methodology, mostly used by policy researchers to better understand different social interventions, and also to an “approach to governing.” More precisely, policy experiments are “a temporary, controlled field-trial of a policy-relevant innovation that produces evidence for subsequent policy decisions.” They can help produce evidence that informs policy implementation.

New regulation often has unintended consequences. In tech, perhaps the clearest example has been SESTA-FOSTA, which—while passed to curb sex trafficking—has had the unintended effect of harming sex workers. Policy experiments have the potential to reduce these unintended effects by testing out potential regulations in a time-limited and controlled setting.

Scholars trace the idea of policy experiments as a form of governing back to the early pragmatists. For the American philosopher John Dewey, “the whole of society might become a laboratory and every activity might be treated as an experiment.” By this, Dewey meant:

“…that the policies and proposals for social action be treated as working hypotheses, not as programs to be rigidly adhered to and executed. They will be experimental in the sense that they will be entertained subject to constant and well-equipped observation of the consequences they entail when acted upon, and subject to ready and flexible revision in the light of observed consequences.”

Civil society organizations, policy researchers, and governments around the world have overseen policy experiments for decades. For example, European governments have been running experiments in climate governance and to test climate mitigation approaches. Policy experiments in Universal Basic Income (UBI) have been tested across the world over the last 40 years, some run by foundations or nonprofits and others by local or state governments. In March 2022, the city of Los Angeles began a UBI pilot called the Big Leap, under which 3,200 city residents received $1,000 each month for one year. To qualify for the program, a resident needed to be below the federal poverty line, have at least one dependent, and have experienced hardship related to COVID-19. Durham County in North Carolina just announced a new UBI experiment, which will provide $750 a month to 125 families.

But there have been fewer policy experiments in technology policy. And when such experiments have been conducted, they have tended to be more focused on issues such as procurement and standard setting. Between 1972 and 1982, for example, the National Bureau of Standards ran the Experimental Technology Incentives Program (ETIP), which carried out a series of experiments to test “whether changes in government procurement and regulatory policies…can stimulate demand for new technologies or speed their application throughout the economy.” Experiments included “the use of performance specifications,” “lifecycle costing,” and “value incentive clauses.”

Policy experiments provide several benefits. The Alliance for Useful Evidence, a British nonprofit, argues that “there is an ethical case for government to do experiments.” The organization notes that unless policies are rigorously tested ahead of time, the implementation will amount to its own troubling form of experiment—an “arbitrary” model in which “we will not learn if policies are doing more harm than good.” At the same time, other scholars have argued that policymakers can use experiments to shore up political support for new policies, deploying evidence from experiments as justification for passing new laws. In this way, experiments can help to “minimize any reputational risks” politicians may face in supporting programs, by allowing them to point to empirical evidence of success.

There are drawbacks to policy experiments as well. Researchers contend that political dynamics skew experimentation: While “objective” experiments have long been seen as a means of avoiding politics in favor of empirics, in a recent chapter, Sree Janair argues that “for many policy scientists, politics are an intrinsic part of experiments.” For example, while many different policy experiments have been proposed to mitigate climate change, policymakers are deeply constrained by political realities. Despite the potential insight that policy experiments may furnish, policymakers may be concerned about the risk of failure, and about “appearing to be in a constant indecisive mode of experimentation.” It may also be difficult to obtain consent in a policy experiment, even though consent is a key ethical principle in experimentation design. In policy experiments, it may not be feasible to ensure that every member of the public can consent.

Although policy experiments have attracted far less political opposition than sandboxes (see below), they may broadly conflict with many Republicans’ interests in reducing both regulation and the size of the federal government. Running policy experiments requires growing government offices and institutions.

Perhaps the most significant challenge is ensuring that policy experiments are relevant to governance in practice. For instance, to be responsive to the real concerns of practitioners, interventions may require participation of a wide range of groups across government, civil society, and industry. Yet in a recent article, Gerry Stoker argues that “achieving and sustaining buy-in is not always an easy task and can cause problems with the internal validity of the experiment.” In addition, in cases where an experiment does secure broad participation, these stakeholders can complicate the effective operation of an experiment.

Regulatory Sandboxes

A sandbox is the inverse of a policy experiment: a test of the relaxation of an existing law, rather than a test of a new law that does not yet exist. That is, if policy experiments test innovations in policy, sandboxes test innovations in product.

The contemporary form of regulatory sandboxes started in the United Kingdom in 2016, with a program run by the Financial Conduct Authority (FCA). The program, which has had eight cohorts, allows admitted financial technology companies to test new financial products with real consumers, while receiving guidance from regulators, alongside some relaxation of existing financial policy. Notably, the FCA can in some instances “waive or modify an overly difficult rule for the purpose of the test,” but it cannot waive regulation that falls outside its jurisdiction, such as national or international laws. It may issue a “no enforcement action” letter for existing laws that fall within its enforcement purview.

Since 2016, more than 50 other countries have adopted some form of regulatory sandboxes for financial technology firms. These range from Malaysia to Switzerland, to the United States, where at least 10 states—including North Carolina, Arizona, and Utah—have passed regulatory sandbox laws that establish regulatory sandboxes for state-based financial technology companies. After passing sandbox programs for financial, legal, and insurance industries, Utah passed HB 217, creating an industry-neutral sandbox that can accept participants from across industries. Several other states, including Florida and Tennessee, have considered similar “universal” or “industry-neutral” sandboxes. In Europe, Spain is leading a regulatory sandbox on artificial intelligence.

While there have been few empirical analyses of the performance of these sandboxes to date, many regulators and commentators have touted the potential benefits of sandbox programs. Supporters have argued that sandboxes spur innovation, especially amongst start-ups—taking the view that existing regulation can prevent young companies from experimenting. One scholar observes that many believe “financial regulation is so pervasive that it is difficult to avoid it entirely, and sanctions for failing to comply with financial regulation can be weighty.” Relatedly, some have seen sandboxes as a means of incubating and attracting new innovations to a particular geographic location.

Some analysts have suggested that the greatest value of sandboxes is in facilitating cooperation between regulators and companies in ways that make firms more attractive to investors. One study suggests that the FCA program led to a 15% increase in capital raised over the following two years and a 50% increase in the probability of raising any capital. The analysis saw evidence that this success is explained by the fact that “[s]andboxes could curb informational frictions through regulatory oversight and continuous dialogue between firms and the regulator during the testing period that offers reassurance to investors that firms meet their regulatory obligations.” Other supporters point to the value that sandboxes can bring to the policy development process: A successful sandbox can lay a foundation for improved future regulation by providing a strong evidence base and by helping regulators build capacity through their work with practitioners.

But sandboxes have also provoked significant opposition, especially in the U.S. and U.K. A Financial Times columnist describes the widescale embrace of sandboxes as part of a larger global trend, a “kind of race to the bottom among global regulators to set up the most ‘light-touch’ possible regimes so as to attract start-ups to their jurisdiction—whether or not they are offering consumers and investors anything useful.”

Others claim that government-run sandboxes help legitimize start-ups and companies that may present risks to consumers. Brown and Piroska describe the U.K.’s fintech sandbox program as engaging in “riskwashing”, which they define as: “a financial regulatory institution’s making products or processes of a company seem to involve less risk for stakeholders by engaging in activities that mimic in a superficial or narrow way genuine attempts to assess and reduce risk.” Deloitte recently interviewed a small number of participants in the U.K.’s sandbox and noted that some participants believe consumers may see sandbox membership as a validation of quality.

In the U.S., sandboxes have become the subject of partisan disagreement. Republicans generally support the wide expansion of sandboxes across industries, and many Democrats oppose them for relaxing or removing important consumer protection laws. Republican legislators have supported many of the state sandbox laws.

In 2019, the Consumer Financial Protection Bureau (CFPB) proposed adopting a sandbox program in the financial industry, along with a related policy change of “re-assessing” and likely removing the “data-sharing” and “time-period limitations” for no-action letters—statements by government agencies indicating they will not recommend enforcement against companies for engaging in certain practices. These proposals sparked criticism. A coalition of 80 “consumer, civil rights, legal services, labor and community groups and environmental groups” submitted a letter calling the proposal “an arbitrary and capricious and unlawful measure that could, in effect, give entire industries relief from complying with aspects of consumer protection laws.”

Republican support for the CFPB’s sandbox program may align with the party’s long-term policy goal of undermining the agency’s ability to enact and enforce regulations. Notably, some Republicans have pushed for permanent sandboxes that indefinitely relax regulations, rather than as temporary opportunities to trial new products.

For their part, Democrats contested the 2019 challenge to CFPB’s authority. New York Attorney General Letitia James led a coalition of 21 state attorneys general opposing the CFPB’s implementation of the financial sandbox program. The coalition noted that:

“Under the Proposed Sandbox Policy, approvals or exemptions granted by the CFPB would purportedly confer on the recipient immunity not only from a CFPB enforcement action, but also from “enforcement actions by any Federal or State authorities, as well as from lawsuits brought by private parties.” The CFPB has no authority to issue such sweeping immunity absent formal rulemaking, and, in fact, the CFPB’s statutory authority for the approval and exemption relief described in the Proposed Sandbox Policy is quite narrow.”

Despite the opposition, the CFPB established these new rules and the sandbox program in 2019. These policies expired in September 2022 and were not renewed by the Biden administration. The CFPB’s new leadership “determined that the Policies do not advance their stated objective of facilitating consumer-beneficial innovation,” as well as that “the existing Policies failed to meet appropriate standards for transparency and stakeholder participation.”

Sandboxes in the U.S. have also faced challenges in attracting industry participation. In its 2022 annual report, the Nevada Department of Business and Industry noted that sandbox programs in the U.S. have seen few participants. It ascribed this low participation to disruption from the pandemic, “concerns of regulators and policymakers” regarding would-be participants,” and “hesitation by potential applicants” about the value of participating.

A new model: The policy product trial (PPT) experiment

We propose a hybrid approach to experimentation that we call the policy product trial (PPT) experiment. This hybrid approach is designed to merge policy and product experimentation, while preserving some of the most compelling aspects of each model. It may also address some of the political and practical concerns that commentators have raised regarding sandboxes and policy experiments. Notably, PPT experiments do not indefinitely relax regulation; they offer strong incentives to companies to participate, and they embed monitoring into the architecture of the program to ensure that policymakers, researchers, and companies can evaluate the impact of the program.

Unlike sandboxes, which work best in heavily regulated sectors, PPT experiments could be useful in any regulatory environment. While some PPT experiments may involve relaxation of existing regulation, their ultimate goal is to encourage the implementation of regulation that is informed, reflexive, revisable, and attuned to on-the-ground reality in the tech sector.

PPT programs may be particularly effective in testing new regulation for emerging technologies, such as generative AI models, because PPT embraces the idea that regulators and technology companies can learn more about both technology policy and technology products by trialing them together. Policy development shouldn’t be on hold as products evolve—and neither should product development be on hold as policies evolve. A PPT experiment provides a venue for policy and product to evolve together.

Key components of PPT experiments

PPT experiments include four key components:

  1. PPT experiments are targeted and time limited. PPT experiments are organized around a singular policy and product idea and are limited in duration—which will help to minimize potential harm if something goes wrong and will keep the focus on the pilot nature of the project. These features contrast with some regulatory sandboxes—especially those run by U.S. states—that are long-running programs.
  2. PPT experiments aim to develop new products and new policies simultaneously. PPT experiments change public policy, either by testing a new policy or by revising existing policy, while also facilitating company trials of new products.
  3. PPT experiments include government oversight, transparency, and evaluation. PPT experiments include a PPT Audit Committee (or PPTAC) that reviews performance, including tracking the impact on vulnerable populations and other groups of concern. The committee would include a variety of stakeholders with direct experience and knowledge of the subject of the PPT experiment. To help measure impact, participating companies will be required to share data regarding performance in the PPT program, including data on compliance costs and product performance. The PPTAC would release an annual report on the efficacy of the program, and then produce a final report at the conclusion of the experiment that details specific recommendations for both product and policy improvements. These transparency measures are designed to facilitate policy and product development, enhance public accountability, and allay concerns that PPT experiments will be vehicles for industry capture.
  4. PPT experiments promote industry involvement by offering three incentives for participation: regulatory relief, communication with regulators, and the opportunity to participate in policymaking. First, a PPT experiment could relax certain regulatory requirements for products that are admitted into the program. Trialing products within a government-endorsed regulatory regime could provide companies with some insulation from regulatory risk, along with the public relations challenges they might otherwise face from testing a new product.

Second, PPT experiments facilitate communication between regulators and industry, where regulators can provide guidance and best practices regarding compliance and industry can provide information about products.

These features are similar to existing sandboxes. But because PPT experiments have more limited duration and more robust oversight requirements, they may be more likely to gain support from potential critics—or at least be subject to more muted criticism—and therefore may be more appealing for potential industry participants.

Finally, PPT experiments offer companies an opportunity to participate in the policy development process. By participating in these experiments, companies will be able to demonstrate the impact that new policies may have on product features and point to specific evidence about why particular policy provisions might be preferable to others. We believe this opportunity to contribute to the development of more informed public policy will serve as an incentive to participate.

Potential criticisms of PPT experiments

While they are designed to take some of the advantages of both policy experiments and regulatory sandboxes, PPT experiments may include some of the downsides of each. First, just as critics have described regulatory sandboxes as a means of weakening consumer protection laws, PPT experiments also relax consumer protection laws in the short term.

Incurring some risk as part of a PPT experiment is necessary for testing new products and regulatory policies. Doing so within the controlled environment of the PPT experiment—which requires government oversight and transparency—can offer a more secure method of trialing these new approaches. Consumers will also likely benefit from smarter, evidence-based policies that have been developed in ways that allow for more understanding about their potential impacts.

Another potential risk is regulatory capture. Ongoing dialogue between regulators and companies provides those companies with an opportunity to advocate for their policy agenda and build stronger ties with regulators. Over time, regulators may tilt policy development in favor of industry. While some critics advocate for a strong firewall between regulators and the tech industry in order to avoid this outcome, we believe that ongoing communication between regulators and companies will give regulators a better understanding of new technologies and help industry build positive relationships with regulators and develop compliance expertise. The transparency measures described above are designed to promote public accountability and to help address concerns about capture.

Finally, pilots of policies and products may offer only limited insight into the impacts of these policies and products when they are deployed at scale. Yet even if pilots may offer only incomplete information, they still shine a light on real-world effects that might otherwise remain shrouded in uncertainty. Lawmakers and company executives can then use these insights to develop better policies and products in the future.

Experimental tools in practice

We view tech policy experimentation as a practical means for accelerating and improving the development and implementation of tech policy reform. Even so, the leap from theory to practice is a difficult one.

To facilitate this transition, we describe experimental models for three different topics of technology policy. We use all three different experimental models outlined in this essay: sandboxes, policy experiments, and PPT experiments.

By giving lawmakers the chance to test their assumptions about regulatory models in settings where the stakes are lower, regulatory curiosity has the potential to expand opportunities for policymaking.

We propose two sandboxes in content moderation (including online violence and user choice); a policy experiment in child safety and age verification; and a PPT experiment in generative artificial intelligence. In an article on Lawfare, we provide examples of model legislative language for each program.

In each of these areas, policymakers have struggled to advance reform. Our hope in offering these more granular proposals is to establish a starting point for policy debates about specific experimental models and to provide policymakers with specific options they can use to advance reform efforts. We see these proposals more as examples of how PPT experiments could be used, rather than as arguments for specific policy proposals. We would applaud alternative experiments as a means of breaking through the current impasse in tech policy, even if they deviate from the specifics of what we suggest here.

By giving lawmakers the chance to test their assumptions about regulatory models in settings where the stakes are lower, regulatory curiosity has the potential to expand opportunities for policymaking. The long-term goal is to create a better internet, with more innovation, stronger protections for consumers, and a deeper understanding of how technology impacts our society.

Policy sandbox on online violence

Over the last seven years, Democrats have expressed concern over the lack of action taken by platforms to remove or demote potentially harmful content, including hate speech, disinformation, and online violence. While platforms continue to revise their policies on extreme speech, some governments have considered and passed new requirements that platforms more aggressively remove extremist content. In the United States, policymakers have introduced and debated similar proposals. Some of these provisions, however, may be limited by existing federal law.

Some mechanisms already exist for platforms to coordinate on responding to extremism. The independent Global Internet Forum to Counter Terrorism (GIFCT), for example, runs a program where platforms can share hashed-examples of terrorist and violent extremist content, which can then be identified across platforms and removed. However, it may be beneficial for platforms to also share data about the users who share this content. For instance, to limit the spread of terrorist content on its platform, X might have an interest in reviewing the full body of activity of a user who was removed from YouTube for sharing this kind of information.

Yet there are legal barriers to sharing this type of data between companies. For example, new state-based data protection laws may prevent platforms from sharing users’ personal data. Antitrust laws may prevent platforms from cooperating with each other. Unfair and deceptive trade laws (UDAP) may prevent a company from removing a user on one platform based on activities on another.

To experiment with product and policy options that could address the challenges inherent in this issue, policymakers could enact a two-year regulatory sandbox, where participating companies agree to share personal information of users that have been actioned or removed for terrorist recruitment, distribution of terrorist content, or glorification of terrorist action. The government would agree to not enforce existing UDAP or antitrust laws for program activities undertaken in good faith. As part of this sandbox, platforms could also test policies and enforcement procedures that could help to reduce user exposure to online violence, as well as options for sharing relevant data with law enforcement and researchers.

Importantly, there must be limitations on both platform and government access to these shared data. Participating platforms must agree to not use these data beyond the requirements set out in the experiment. Unless required by law, these data would not be shared with law enforcement.

A sandbox audit committee (SAC) would monitor the performance of this experiment. The SAC would be tasked with overseeing implementation, including whether platforms adhere to the experiment’s terms. Admitted platforms must also agree to provide relevant data to the SAC so that it can assess performance and compliance, though platforms would not be required to provide data when they are legally barred from doing so. The SAC would be made up of experts from a variety of relevant fields, including content moderation, online safety, domestic terrorism, technology, and law enforcement. The SAC would also publish annual reports: one at the end of the first year and a final report at the experiment’s conclusion. The report would describe the efficacy of the program, how platforms used the shared data, and the impact of the program on content quality and user privacy.

Policy sandbox on user choice in content moderation

Republicans’ concerns about platform content moderation practices have focused less on allegedly harmful content allowed to remain up and more on the alleged harms of removing legal content. Republicans claim that platforms exhibit an anti-conservative bias, arguing that content policies and enforcement disproportionately target conservative political speech. Although little data has surfaced to support the allegation—indeed, even Republican-led audits commissioned by platforms have failed to discover evidence—the concern persists, and platforms have struggled to develop a response that assuages Republican lawmakers and users.

Platforms might be better able to address these concerns if they were to develop the tools to customize moderation responses to different sets of users. Some platforms have only one binary option for addressing content that may violate their terms: They can either leave it up or take it down. Other platforms can downrank content that they identify as harmful but does not violate their terms. But even downranking is a blunt tool: It is typically applied universally across a platform and not customized to an individual user’s content preference. Because of these limited moderation options, users may be denied the choice to see content that is barred by a platform’s terms, even if it is not barred by law, such as nudity and spam.

There are many reasons why platforms have not yet developed such tools: cost, technical limitations, and concerns about user experience. But they might also be concerned that some moderation choices could result in consumers being exposed to more problematic or harmful content—for example, if users elect to receive content that promotes vaccine misinformation. Likewise, other choices could be politically perilous in the absence of some regulatory protection. A user tool that permits or restricts political speech might invite the attention of state attorneys general, the FTC, or the Justice Department. For example, if a platform allows users to view more right-leaning content that Democrats deem to be harmful, such as election denial content, it might attract the ire of Democratic attorneys general. Similarly, if it allows users to view more left-leaning content that Republicans deem to be harmful, such as certain anti-racist educational tools, it might invite investigations from Republican attorneys general.

To encourage platforms to develop such tools, we propose a two-year regulatory sandbox in which admitted platforms offer users more granular tools for selecting the content they want to see. The sandbox would provide regulatory relief for any products tested in the experiment, so long as the participating company follows all experiment requirements in good faith. A user could choose to receive more political content or, with more granular controls, more liberal or conservative content. Users might also be able to reduce the stringency of spam filters, which sometimes catch political speech that has properties similar to spam. Alternatively, they could choose to receive more or less “borderline” or “lawful but awful” content.

These tools might have bipartisan appeal: A left-leaning user might elect a more restrictive approach to certain types of “harmful” speech or might decide to see more left-leaning content, whereas a right-leaning user would have the ability to make different choices. Of course, these tools might also apply to non-political speech. A user could choose to receive more sports-related content or more content from a favorite sports team.

Because of the potential for increased distribution of harmful content, all applicants should be required to submit risk assessments and credible mitigation plans for potential risks consumers face in using these tools. Particular emphasis should be placed on the equity impacts of these product features, such as the impact on women, people of color, and rural communities. Mitigation plans will help platforms to develop other interventions to limit the negative consequences of access to that content without removing or downranking it. For instance, potential vaccine misinformation could be accompanied by additional data about the information source, enabling users to better assess its credibility.

Like the other experiments described here, this one would establish an audit committee to conduct oversight of the experiment, including gathering relevant data from participants and publishing annual reports on the products tested and the performance of the experiment to date. The report would include information about the equity impacts of products and policies tested in the sandbox.

Policy experiment on age verification certification

There is broad consensus that tech platforms should avoid harm to children and that federal and state public policy should establish strong protections for children online. Yet despite this consensus, experts disagree on the best strategies for making the internet safer and healthier for children. Age assurance is a key sticking point for online safety reform: In order to protect children, after all, a platform first has to identify which users are underage. Doing so, however, can result in more surveillance of both children and adults, as well as equity, usability, and security concerns.

Consider a recently passed law in Arkansas, which requires companies to employ third-party age verification tools to ensure minors have parental consent to use social media. However, companies have no real way of ensuring that third-party providers offer effective services, minimize the collection and use of personal data, store and process data securely, or guarantee their systems are equitable.

The reality is that there is insufficient data to understand the best way forward with age assurance for either policymakers or platforms. There are many ways to build an age-verification system: What are the design features that are most likely to produce the results we want? There are no perfect age assurance solutions; each method involves balancing tradeoffs. For example, accurately assessing a user’s age requires collecting and processing personal data, raising significant privacy and data security concerns. Alternatively, while collecting government IDs may offer an accurate method verifying a user’s age, doing so raises design, usability, and equity concerns, as not all users have legal and up-to-date government IDs. There are both benefits and drawbacks to situating the verification tool at the app layer, versus in the app store, and vice versa. It is essential that policymakers accurately understand the limitations and advantages of different methods in practice, so they can realistically balance costs and benefits.

At the same time, as more companies choose or are compelled to revise their age verification or assurance processes, they face significant challenges. Companies must either balance the tradeoffs inherent in age assurance themselves or choose a vendor to enact new assurance processes. Not only is there little guidance for platforms on how best to implement age assurance programs, but it can also be difficult to ensure that vendors are employing industry best practices.

A policy experiment on age verification would help to surface these data, enabling platforms and policymakers to experiment with different strategies for protecting children and allowing experts to work with platforms to measure success. One option would be an experiment to test a state or federal voluntary certification program for both first- and third-party age verification systems like those required in Arkansas.

For this policy experiment, a new working group would create and then test a set of certification standards that ensure that companies are minimizing privacy concerns, storing and processing data securely, and addressing equity concerns. The working group could be situated within the FTC or as part of a state consumer protection agency.

To join the experiment and gain certification, companies seeking certification would need to submit data detailing the accuracy and error rate of their age assurance technology. The company would need to resubmit each year to maintain certification, so the committee could review any important operational changes. If the program were run through the FTC, once the commission certified an age assurance product, it would commit to refrain from enforcement actions related to that product. A state-level agency could act similarly.

Platforms may wish to select an age assurance tool certified as part of this policy experiment and may be more inclined to utilize one of these tools if they receive some liability protection for that usage. Accordingly, a platform interested in using a third-party vendor could apply to participate in the experiment. If admitted, then the platform would receive relief from enforcement related to use of this vendor.

The experiment would run for two years, after which time the FTC or state consumer protection agency would release an evaluation report on the program. Participating companies—including both platforms and age assurance vendors—would be required to share performance data with the FTC to inform this evaluation. The report would include information about the equity impacts of products and policies tested in the experiment, such as the impact on women, people of color, and rural communities. This experiment could inform the development of a more permanent certification program.

PPT experiment on generative artificial intelligence regulation

Generative artificial intelligence (GAI) has captivated public, press, and policymaker attention. There has been rampant speculation about the potential harms of the technology. That speculation has also fed into increased interest in how regulators might govern the technology. A number of potential models have been proposed by legislators and industry: a new federal agency that will govern AI, licensing requirements, audits and risk assessments, enhanced transparency mandates, limits on training data, opt-outs for IP-protected content, and an international agency styled on the UN’s nuclear agency, among others. In Europe, Spain is leading a regulatory sandbox on artificial intelligence.

Despite interest in the topic and regulators’ eagerness to intervene, uncertainty dominates the policy landscape. What harms should policy aim to address? Will policy intervention hinder innovation in the field? Alternatively, is some level of regulation necessary to enable the field to flourish and protect the public from potential harms?

As policymakers discuss the potential for new regulation, there is a concurrent debate about how existing law applies to the new technology. Specifically, several commentators—including one of us—have argued that GAI platforms will not receive Section 230 protections in most instances. In cases where GAI platforms generate new content, they are likely to be found to be “information content providers” rather than “interactive computer services,” and will therefore lose the ability to use Section 230 as a defense. Supreme Court Justice Neil Gorsuch made a similar statement during oral arguments in Gonzalez v. Google. Others, meanwhile, have taken the opposing view, arguing that most applications of GAI will receive protection. If platforms do lose Section 230 protections for GAI products, the resulting liability risks hamstringing the technology while it is still in its nascent phase.

This legal landscape provides a ripe environment for exploration. Might a new intermediary liability regime for GAI be able to strike a balance between empowering companies to innovate on the one hand, and protecting consumers and society from harmful uses of the technology on the other?

A PPT experiment could help shed light on the policy and product implications of creating a new limited intermediary liability regime for GAI technologies. In this test, a GAI platform would receive Section 230-style protections for content that its model produces in response to a user prompt, unless a plaintiff could show that the platform was solely responsible for creating the content. For example, a GAI platform might be deemed to be “solely responsible” if it “hallucinates” defamatory speech, essentially inventing a response. This PPT experiment would also invite GAI platforms to trial products that use this new liability regime, such as chatbots and content moderation features. This experiment is a PPT because it changes the legal regime that governs GAI products while also inviting product experimentation.

The “sole liability” exception constitutes a change from the existing text of Section 230, where a platform is liable if it creates or develops content “in whole or in part.” This current statutory definition likely strips GAI platforms of the ability to use Section 230 in a broad range of cases, since GAI tools are designed for the purpose of developing content at least “in part.”

Yet if policymakers immunize GAI platforms in all cases where they produce content in response to a user prompt, platforms would receive protections even in cases where they hallucinate—or invent—illegal content. Narrowing the definition to focus only on cases where a platform is “solely” responsible will help test the costs and benefits of affording platforms broader liability protections, while still maintaining enough liability to hopefully incentivize platforms to improve GAI technology. If the goal is to determine a liability regime that incentivizes companies to engage in socially desirable behavior and mitigate socially harmful behavior, then the question is whether this experiment improves upon the status quo.

This exception also tests the administrability of drawing a fine liability line. Will a “sole liability” exception give companies the comfort they need to test valuable new tools, or will the legal risk still be so high that they constrain their product development out of a fear of being hauled into court? If the exception works on paper but not in practice, then it is not a viable option.

Companies participating in this PPT experiment must agree to oversight by an audit committee and must commit to release relevant data to that committee. The audit committee would publish annual reports on the performance of the PPT experiment, which would include a review of specific costs and benefits associated with the tests, including their impacts on equity. Reports should also include recommendations for lawmakers on intermediary liability frameworks that will maximize the benefits and minimize the risks of GAI technologies.

Because this PPT experiment would require GAI platforms to produce information about product tests for the purposes of oversight, it would also shed light on how to conduct audits of GAI technologies. While there has been broad agreement that AI systems should be audited in some form—audits have been central to many passed and proposed regulations, appearing in the EU’s AI Act, President Biden’s recent executive order on AI, and OpenAI CEO Sam Altman’s congressional testimony—the precise mechanisms of conducting those audits is far less clear. This PPT experiment will help to provide data on the auditing process that could inform the development of best practices. To capitalize on this opportunity, the annual reports should identify challenges the committee might have faced in carrying out its auditing responsibilities, the strategies it used to address those challenges, and further recommendations for conducting GAI audits.

Conclusion: How we should experiment with experimentation

Though often at odds, technology companies and regulators have much in common. Regulators often feign certainty—acting as if solutions are obvious and the benefits of reform inevitable. Tech companies often feign certainty in the other direction—insisting that there is no viable alternative to the status quo and emphasizing the potential drawbacks of reform.

Regulatory curiosity offers an alternative path. Regulators and industry should adopt a position of curiosity, using experiments to learn more about the interplay between tech policy and tech products. Curious experimentation will inspire smarter regulatory options and better products over time.

Here, we offer lawmakers a set of tools for how they might embrace this curiosity by using policy experiments to test new policies, regulatory sandboxes to test new products, or our PPT experiment to test how new policies interact with new products. In Lawfare, we go one step further, proposing model bills that could bring these hypothetical models to life. By providing tangible options for experimentation in content moderation, child safety, and generative artificial intelligence, our hope is that lawmakers will take steps to implement this more curious approach to tech policy reform.

Authors

  • Footnotes
    1. In an analysis of the impact of this legislation, entitled “The politics of Section 230 reform: Learning from FOSTA’s mistakes,” Quinta Jurecic notes that “Congress seems to have little interest in reviewing its past work.”
    2. In 2022, Senators Warren (D-Mass.) and Wyden (D-Ore.) and Representatives (D-Calif.) Khanna and Lee (D-Calif.) introduced the SAFE SEX Workers Study Act that would have directed the HHS to study the impacts of SESTA/FOSTA.
    3. Notably, one report suggests there could be “policy testing” sandboxes. Rather than a sandbox designed to test new products, this one would test out if new government policies may “impede beneficial new technologies or business models (UNSGSA, 2019: p. 27). Despite a reference to support for this model by the Monetary Authority of Singapore, there has been little discussion of “policy testing” sandboxes, and we could find no examples of these programs actually in practice.
    4. For an exception, see a recent report from the Bank for International Settlements.
    5. Notably, in this way, sandboxes, like policy experiments, can also help drive new innovations in policy, even though sandboxes start with experimentation on the product side. The somewhat fuzzy boundary between a sandbox and an experiment is one of the key motivators for our hybrid model described below.
    6. Hashed content is content that has been transformed into fixed-length output. This allows users to identify illegal or problematic content without having to see the original content.
    7. For instance, the FTC often uses Section 5 of the FTC Act to initiate enforcement proceedings related to how companies treat minors.