Sections

Research

Dual-use regulation: Managing hate and terrorism online before and after Section 230 reform

phones

The old military aphorism that “the enemy gets a vote” is oft forgotten in both Silicon Valley and Washington, D.C. This cliche is worth keeping in mind as Congress debates adjustments to Section 230 (§230) of the Communications Decency Act. For starters, Silicon Valley’s persistent inability to ground products in the knowledge that some users will deliberately abuse them, and the unsurprising abuse that results, motivates many in Washington to adjust §230’s liability protections. But policymakers, intent on taming platforms, must not inadvertently empower the real dangerous actors–terrorists, child predators, hate groups–who will abuse any technology and any legal recourse created by adjustments to §230.

Harm manifests online, sometimes in world-changing ways. To some, this is evidence that the current regulatory regime should shift, and I grudgingly agree. But bad policy could very well make things worse, especially when it comes to high-severity, relatively low-prevalence harms like terrorism and hate. This paper distinguishes those issues from the sometimes-related but ultimately broader issue of misinformation that often manifests as higher-prevalence, lower-severity harms. On the former, policymakers should keep three core ideas in mind as they move forward.

“Terrorism and hate are not an intentional feature of digital communications, but they are an inevitable bug and should be treated as such by all responsible platforms.”

First, the abuse of the internet by terrorists and hate organizations is persistent and longstanding. The issue is not hypothetical, as inaccurately claimed by the U.S. Court of Appeals for the 5th Circuit in its September 2022 decision upholding the constitutionality of a Texas law restricting platforms’ freedom to moderate content. To the contrary, this abuse has been a ubiquitous and consistent reality for nearly 40 years. Indeed, such abuse long predates the modern search, recommendation systems, and ad-based businesses models often blamed for current digital dysfunction. The 5th Circuit’s head-in-the-sand decision is deeply problematic, but the longevity of those digital harms also suggest that conventional wisdom now ascribes too much responsibility for these harms to a narrow set of design features. Such systems do pose unique challenges for managing hate and terrorism—and especially misinformation and disinformation at scale—but a narrow focus on those features encourages an unhealthy naivete about product risks tied to a wider set of risky features. It also tends to ignore the agency of the extremists themselves, who have proven to be adaptive and resilient. They have, do, and will abuse a wide range of products. Policymakers, civil society, and Silicon Valley should encourage product development and legislative remedies with that dynamism in mind. Terrorism and hate are not an intentional feature of digital communications, but they are an inevitable bug and should be treated as such by all responsible platforms.

Second, the work that companies do to ensure the safety and security of their platforms—known within the industry as “trust and safety”—are extraordinarily complicated and often involve restrictions on terrorists and hate organizations far more intensive than those that the government (at least a government bound by the First Amendment) could reasonably apply. It is important to understand what trust and safety efforts conducted by platforms really entail. These efforts all operate under the protections of §230 and fundamentally shape the internet as we experience it. Yet these programs, in part because of lack of transparency by platforms, are often misrepresented in policy debates. Platforms were very late to address violent extremism online and have done so imperfectly. Those efforts only emerged after significant government pressure. The industry’s failures are regularly cited as evidence that §230 should be amended in an effort to incentivize more effective efforts to counter hate and violence. But that laudable intent must be weighed against the fact that many platforms already go far beyond their legal requirements voluntarily.

Regulation incentivizing companies to focus on removing the subset of proscribed content that is illegal will de facto disincentivize them from pursuing a broader set of harms. Such incentives are likely to produce over-enforcement on legitimate internet users that highlight political or cultural issues also pursued by legally proscribed entities, such as Palestinian nationalism, Kashmiri independence, or fans of narcocorridos. Those same incentives may result in companies shifting resources away from hate groups that are not legally banned, like the violent extremist group The Base or the Ku Klux Klan. In a worst-case scenario, regulation might disincentivize companies from investigating threats on their own platforms and providing that information to law enforcement. Although it rarely makes headlines, companies regularly identify real-world dangers before third parties; if exposed to liability for identifying such threats and referring them to law enforcement, platforms may be disincentivized from doing so.

Third, adjustments to §230 risk litigation by violent or hateful actors against platforms themselves. In the United States, such lawsuits from white supremacists are now routinely dismissed based on §230. By contrast, in Italy a neo-fascist group successfully sued Meta, (formerly Facebook) for removing its page, and the page was reinstated as a result. Meta also now faces an order to pay fines directly to the neo-fascist group itself. Calls to allow civil action against platforms that unintentionally host noxious content aim to provide relief to victims and incentivize platforms to more aggressively police such material. Penalties for egregious misconduct may indeed be warranted. But this is perilous ground if it inadvertently creates standing for hate groups to pursue civil action against platforms for removing content that is legal but violates platform rules.

Ultimately, reform in the United States should begin by requiring platforms to provide detailed disclosures about their efforts to defend platforms from abuse rather than create strict liability regimes for moderation errors. All platforms with user generated content should be required to disclose their efforts, with higher expectations for more complex platforms. Importantly, rather than indexing requirements based on user numbers, revenue, or market capitalization, regulators should scale up obligations based on the number of “surfaces”—that is, spaces for user-generated content—that a platform must defend. Administering such a regulatory system would require an expert bureaucracy, very careful definitions, and buy-in from Congress for a long-term solution—a much more difficult proposition than a politically satisfying, but ultimately hollow, quick fix.

Quick 230 Primer

Section 230 has two basic provisions. The first immunizes platforms from liability for content that they distribute but that is produced by their users; the second protects the ability of platforms to moderate content without incurring increased liability. The law contains important exceptions to these core principles, designed to facilitate enforcing federal criminal law, intellectual property law, state-level laws regarding prostitution and sex trafficking, and federal civil cases related to sex trafficking.

It would be impossible to list here the many proposals to amend §230. Some would increase platform liability for hosting illegal content, others would facilitate civil liability for hosting content related to real-world harm, and some would do away with the provision entirely.

“Many advocates of §230 reform hope that removing explicit liability protections will convince platforms to moderate more content.”

More foundationally, it’s not clear how much §230 actually expands on First Amendment protections of platform decisions about speech. After all, the First Amendment protects private actors, like platforms, from government efforts to restrict or compel speech. That concept seemingly applies to a private actor’s decision about restricting, removing, or promoting other content. Nonetheless, the legal regime predating §230 suggests platforms might face liability for content moderation decisions without it. One court found that early web innovator Prodigy accepted increased liability because it moderated content to reflect a “value system that reflects the culture of the millions of American families we aspire to serve.” On the grounds that Prodigy was trying to keep pornography from overrunning its platform, the court decided that the company accepted increased liability for noxious content that it failed to remove. Congress crafted §230 in part to help websites out of this bind.

Many advocates of §230 reform hope that removing explicit liability protections will convince platforms to moderate more content. The Prodigy decision suggests doing so might create the opposite incentive. Moreover, legislation like the Texas Social Media law indicates that some reformers intend to use legislation to prevent responsible content moderation rather than encourage more of it.

This would be a disaster for those interested in a functional internet, let alone a safer one. For all the failures of Silicon Valley to moderate content—and there are many—voluntary efforts by platforms to remove and reduce noxious material have dramatically improved since the 2000s. Such efforts helped drive the broad adoption of the internet and have moved far beyond what the United States government could possibly mandate without running afoul of the First Amendment. This work by platforms reflects an awesome and dangerous power and thus demands much-improved oversight. But it should not be prohibited nor inadvertently disincentivized.

A Brief History of Extremism Online

“The ‘original sin’ of the internet is far more foundational than the relatively recent development of contemporary advertising models.”

In November 2021, the founders of Substack, an online newsletter publishing service, argued that extremism and other digital harms flourished on Facebook because the platform is, “like many of its peers, captive to that original sin of the internet, the ad-based business model.” The critique echoed a wide range of other critics of “Big Tech,” many of whom argue that online harms are largely a function of a narrow set of relatively new features, including not only advertising but also recommendation engines that suggest content or accounts that users should engage. There is evidence suggesting those features are problematic, especially for high-prevalence harms like misinformation, but the wider research picture is much cloudier than many advocates are keen to admit. Moreover, the Substack argument misses a core point: extremism online long predates those features. The “original sin” of the internet is far more foundational than the relatively recent development of contemporary advertising models.

The internet’s true, and still prevalent, “original sin,” is naivete among founders, funders, and tech visionaries, according to which technical progress and digital connection inevitably lead to better social outcomes. This understanding was based, in large measure, on the idea that the internet would free individuals from the overbearing power of the state. John Perry Barlow’s famous Declaration of the Independence of Cyberspace has come to embody that optimistic vision of the internet. Declaring that governments “have no moral right to rule (denizens of cyberspace) nor do [they] possess any methods of enforcement we have true reason to fear,” Barlow hoped that cyberspace would lead to a more “humane” world.

Contemporary tech critics often reference his sunny vision of the digital future as a counterpoint to an increasingly dystopian understanding of our current internet. There is no doubt that today’s internet houses innumerable dangers. But in an effort to condemn the contemporary platforms, thinkers sometimes lionize Barlow’s utopia as if it actually existed. It did not.

By 1996, the year that Barlow declared the independence of cyberspace and §230 became law, the first generation of American white supremacist bulletin boards were more than a decade old. The same year that Barlow made his declaration, the Anti-Defamation League documented a slew of antisemitic and white supremacist websites operated in the United States. The first dedicated salafi-jihadi websites, including those run by al-Qaeda, were launched in the mid-1990s. By 1998 more than half of the U.S. government-designated Foreign Terrorist Organizations (FTOs) had websites. Two years later, almost half of all FTOs did, according to an influential early study. In April 2004, the internet-monitoring firm SurfControl claimed it had found nearly 11,000 websites advocating hate or supporting violence. Though 9/11 is not usually considered a terrorist attack planned on the internet, Mohammed Atta researched flight schools online. The website of Jund al-Islam, a Kurdish jihadi group that became part of the justification for the 2003 invasion of Iraq, was hosted on the Yahoo-owned Geocities.

Barlow’s dystopian doppelgangers in the global jihadi and American white supremacist movements celebrated the same technology he did. And they did so for many of the same reasons. They saw the internet’s promise of decentralized, relatively secure communication as a way to pull geographically distant communities into cohesive movements outside of government control. A decade prior to Barlow’s “Declaration of Independence,” the Klansman Louis Beam built early digital bulletin boards for distributing white supremacist propaganda. Beam argued that structured organizations would flatten because “any patriot in the country is able to tap into this computer at will in order to reap the benefit of all accumulative knowledge and wisdom of the leaders.” Such “Leaderless Resistance,” he argued, would insulate the movement from government crackdowns. He also claimed it would also make extremist movements transnational, specifically by giving Canadian white supremacists, who faced stronger prohibitions on racist material, a way to access propaganda from the United States, where such material was protected by the First Amendment.

For global jihadis, the Syrian jihadi theorist Abu Mus’ab al-Suri (Mustafa Setmariam Nasar) played a comparable intellectual role to Louis Beam. Following the Soviet withdrawal from Afghanistan in 1990, al-Suri began to conceptualize a global phantom organization (tanzim ashbah) in place of the numerous nationally-focused organizations that characterized the salafi-jihadi movement at the time. Thinking in parallel to Beam, al-Suri argued for “individual action” (al-’amal al-fardi) that would replace the bureaucracies of contemporary jihadi groups. He ultimately summarized this model as “system, not organization” (nizam al tanzim). Al-Suri was an accomplished propagandist and an organizational visionary, but he did not build and manage websites himself. Nonetheless, he popularized the use of the internet among jihadis—and his ideas illustrate the continuity of capability that global connectivity offers extremists.

Sometimes those continuities are reflected unintentionally in the language used by scholars to describe these capabilities. Al-Suri’s most detailed biographer foreshadowed the QAnon exhortation to ignore mainstream experts by describing al-Suri as “acutely aware of [the internet’s] potential to empower the masses to conduct their own research, communicate with one another, and identify with an idea larger than themselves.” In the parlance of the modern internet, al-Suri was an influencer.

“Even today, as techno-dystopia feels closer than ever, far more people have used the anarchy of digital communications for good than for evil.”

Of course, Barlow and his co-travelers were not wrong about the promise of cyberspace to produce tremendous good. The internet revolutionized commerce and allowed people around the world to communicate more freely than ever before. Liberating educational opportunities abound, and populist movements like the Arab Spring have challenged corrupt and autocratic hierarchies. Even today, as techno-dystopia feels closer than ever, far more people have used the anarchy of digital communications for good than for evil.

There is no denying, however, the persistence of violent extremists using digital technology. Over the years, extremists of many different ideological stripes have used a wide range of software to operate online. The earliest used simple open-source bulletin board software. In the 1990s, sites moved to web forums and early web hosting systems like Geocities. Prior to the rise of modern social media and messaging applications, one of the most important software packages was vBulletin, a program for running digital web forums. In addition to powering my favorite San Francisco 49ers forum in the mid-2000s, vBulletin undergirded the digital communications of al-Qaeda, many of its cousins, and the prominent white-supremacist forum Stormfront. Larger files were uploaded in various formats to innumerable free file-hosting sites for download.

Al-Qaeda relied on vBulletin and file-hosting sites for propaganda until 2008, when a coordinated attack on key forums disrupted the core vBulletin-based network. (One forum was left functional as a honeypot run by Saudi Arabia.) That disruption corresponded with increased interest among jihadis in various new social technologies. Extremists wrote primers on the new technologies of blogs, tracked the centrality of technology to the 2008 Democratic primary campaign, and explored social media. One jihadi later noted that in the face of attacks on bespoke servers and forums, fellow extremists needed to repost content on Facebook, Youtube, and other major social platforms: “so long as the house slave [President Obama] tries to hide [our videos], we will release them on the Crusaders’ very own websites.”

Jihadis did take that approach. After I joined Meta in 2016, we developed the ability to identify old al-Qaeda and proto-Islamic State material on the platform. As I recall, the earliest material was posted in 2007, although these groups did not aggressively turn to social platforms until a few years later. We removed this old material, although most of it was no longer doing much—if any—practical harm. The old accounts were long dormant, and the overwhelming majority of the content had not been viewed by anyone in years.

The jihadi shift to social media was echoed by other extremist movements, including white supremacists, though the latter did not face significant disruption to their core platforms until years later. The disruption of bespoke vBulletin-based sites likely encouraged jihadi migration to social media, but violent extremists ultimately shifted to Facebook, Twitter, and YouTube for the same reasons other people did: they were simple, functional, reliable ways to access a network.

As a practical matter, the shift in digital geography meant that if anyone was going to take action against these malicious actors, it would have to be the platforms themselves. Traditional national security agencies could not attack Facebook servers the way they might a bespoke vBulletin site hosted overseas. Even messaging strategies and creating undercover accounts would be more difficult for federal agencies, who were more likely to incidentally come into contact with American citizens on major platforms.

The platforms were extremely slow to accept the responsibility this implied. In 2011 and 2012, representatives from the National Counter Terrorism Center (NCTC) briefed social media executives about Anwar al-Awlaki, an American preacher linked with al-Qaeda in the Arabian Peninsula, whose effective English-language propaganda was increasingly linked to radicalization cases in Europe and North America. Companies dutifully sat through such briefings—I helped organize some of them, so I know—but failed to invest systematically to address terrorist material online. Social platforms did not systematically shift their approach until 2015, when the Islamic State was rampaging across Syria and Iraq and using social platforms to inspire violence in Europe and the United States. Unsurprisingly, western politicians began placing increasingly intense pressure on platforms to take action.

But platforms were not the only ones late to address the abuse of the internet by extremists. Journalists, activists, and policymakers celebrated the digital revolution for more than a decade after the Godfather of the Islamic State beheaded Nicholas Berg on camera in 2004. Despite Stormfront’s tenure as a core white supremacist forum from 1995 forward, a Google News search for “Stormfront” bound prior to December 31, 2008 produces fewer than 50 results—and quite a few of them are about weather. Virtually all of the stories about al-Qaeda’s digital exploits in the 2000s were written by national security reporters rather than journalists assigned to Silicon Valley. Extremists built their digital infrastructure under the noses of both Silicon Valley and its supposed watchdogs.

These historical failures to recognize the internet’s harms continue to shape contemporary critiques. In April 2022, for example, the psychologist Jonathan Haidt published a controversial article in The Atlantic, arguing that social media is at the heart of contemporary democratic malaise. Haidt’s most pointed arguments focus on misinformation, which is not my focus here. But it is striking that Haidt’s analysis is so centered on recent events. Even a related (and highly useful) compilation of academic studies he helped gather on digital harms is limited to studies from 2014 or later. This effectively leaves out two decades or more of data about online harms before the age of centralized social media.

Granted, Haidt is focused on features of social media specifically—but in constraining his attention to that issue, he effectively selects on the dependent variable of social media features, excluding study of the more general impact of the internet. This is problematic given Haidt’s sweeping analysis, including regarding violent extremism, and equally sweeping policy proposals. The issue is not just government policy. By focusing narrowly on a select set of features—like ad-based revenue, recommendation systems, and the desire to bolster engagement in the form of reactions such as “likes”—Haidt gives license to digital innovators to ignore their social responsibilities so long as they do not utilize those features.

The risk is that this will empower innovators who, like those at Substack, end up making old mistakes all over again instead of learning to exercise caution and responsibility from our recent digital malaise. Meanwhile, violent extremists—who have migrated from bulletin boards to Geocities to vBulletin to Facebook—will keep an eye out for the next big thing, whether the platform in question charges for ads or not. Let’s not fight the last war.

Large social media platforms did eventually dedicate significant resources to counterterrorism work. Those efforts dramatically impacted extremist efforts online, but they did not eliminate it. Trust and safety teams still struggle for resources across industry. Moreover, such efforts only came after the most ruthless terrorist group imaginable seized territory the size of the United Kingdom and platforms came under intense—albeit mostly informal—pressure from policymakers. Today, the persistence of violent extremism online has convinced many that stronger government intervention is necessary to compel more aggressive platform actions. Indeed, there is little doubt that companies could better defend their platforms. At the same time, company successes have been significant, often occur out of view of researchers, and should not be ignored because regulators intent on rectifying the gaps in platform defenses may inadvertently undermine the successes made to date.

Countering Violent Extremism Under 230

Social media companies were slow to address terrorism and extremism online, but large platforms now have a range of tools and processes to counter these harms. The nuts and bolts of what is now broadly called “trust and safety” includes seven categories of action: writing rules; identifying and removing violations; limiting feature access; responsibly implementing legal and political restrictions; responsibly engaging law enforcement; integrating with industry; and empowering users.

Below, I walk through these seven categories. Given my background, the examples I’ll use to illustrate these types of action generally involve Meta and terrorism, but the principles apply across platforms and harm types. The purpose is twofold: first, to illustrate the sort of actions platforms have taken under §230; and second, to build a rough framework for future regulators to compel detailed disclosures from platforms about their systems. This section is not a comprehensive description of Meta’s actions regarding terrorists and hate groups, nor does it claim that Meta’s actions to date are sufficient.

Write rules

Platform policies are intended to limit harm, which means that harm must be defined. Such policies are organized around actors (rules addressing certain people or groups, like Nazis), behavior (rules addressing certain digital actions regardless of content, like spam or creating fake accounts), and content (rules describing acceptable behavior on a platform itself, like bullying or hate speech). Some policy areas may include a mix of policy types. For example, Meta’s policy on dangerous organizations mixes actor and content components. It prohibits a set of listed groups from having a presence on the platform regardless of their behavior. This is an actor-based approach. It also prohibits any user from “praising” some of those entities on Meta, which is a content-based approach.

Conceptual rules to limit harm often get deeply complex in practice. Banning an entity like the Islamic State is a straightforward moral and political proposition, but it’s more difficult to ground such a decision in clear rules that can be used to assess other, more marginal entities. Despite general opposition to civic violence, for example, there is no universal definition of “terrorism,” notwithstanding dedicated efforts to pin down the term by a wide range of scholars and political bodies. And unfortunately for digital policymakers, defining concepts like “terrorism” is just the beginning when it comes to drafting policy. It is useful to think about this process as consisting of five components:

  1. Define the harm. For example, “terrorism.”
  2. Assess whether entities meet that definition. For actor-based policies, this means building a process to specify criteria that reflect definitional elements, prioritize which groups to assess, gather information, and ultimately determine what real-world entities (groups, individuals, etc.) meet the defined policy.
  3. Define what constitutes a violation. Platforms must determine what actually qualifies as a violation of the rules and what does not. Meta, for example, prohibits praise, substantive support, and representation of so-called Tier 1 Dangerous Organizations. This sweeping prohibition effectively excludes everything except news reporting, criticism, and general discussion of the entities. The terms have colloquial meaning, but they demand detailed guidance for reviewers.
  4. Identify what indicators the platform is looking for. This means identifying symbols, slogans, and sub-entities representative or suggestive of the entity or policy in question. For the Islamic State, this might be various media outlets, key leadership figures or core slogans like “remaining and expanding.” For a content-based policy like a prohibition on hate speech, it might include Holocaust denial or calls for the “Day of the Rope” or “RAHOWA” (Racial Holy War). For a behavior-based policy, the platform would need to granularly specify the actions (extreme posting activity, impersonation, etc.) that mean an account or network violates the rules.
  5. Scope consequences. Finally, platforms must determine what happens if an account or piece of content violates the established rules. Is it removed or downranked? Will the account be removed or accrue “points” that lead to removal in the future? Is the punishment absolute or does it vary by circumstance?

Let’s run through a real-world example, tied to Meta’s Dangerous Organizations policy. In 2020, Meta (then Facebook) designated an entity it called the Violent Unaffiliated Anti-Government Network (VUAN) as a terrorist group. Meta had defined “terrorism” years earlier. That definition was component one. Now, Meta had to assess step two—whether the VUAN met that definition. This was tricky because Meta’s designation criteria looked for clear examples of violence committed by a distinct organization. The VUAN was a relatively loose organization operating within the wider Boogaloo movement, a highly amorphous, violence-celebrating aesthetic collective. After intense investigation and evidence of violence, Meta determined that the VUAN was solidified enough to designate under the preexisting definition of terrorism. That was component two. That triggered the pre-existing Dangerous Organizations policy, including prohibitions on praise, support, and representation. That’s component three. But the VUAN and wider Boogaloo movement quickly evolved their language and aesthetic, using thinly coded language like “Big Luau” and “Fiesta Boys.” The shifts were intuitive for dedicated specialists inside and outside of Meta; repeatedly updating those indicators for non-specialist content moderators without inspiring false positives was more difficult. That’s component four. In this case, the consequences for violations—remove content, remove users—were already set by the pre-existing policy. That’s component five.

Even a good policy can be hamstrung by the failure to adequately identify or update indicators that a policy has been violated. For example, Meta CEO Mark Zuckerberg’s 2018 decision to allow Holocaust denial on the platform, even as antisemitism and hate speech were banned more widely, undermined the value and utility of the entire policy. Zuckerberg’s decision failed to recognize that Holocaust denial is a core and common element of modern, organized antisemitism and much more rarely expressed with true naivete. Meta and Zuckerberg belatedly fixed this mistake in 2020.

Zuckerberg’s decision on Holocaust denial undermined Meta’s broader credibility on antisemitism and hate speech more generally, despite the fact that the company remained significantly more aggressive than most governments in countering organized hate groups. The U.S. government has designated 73 entities as Foreign Terrorist Organizations, only one of which—the Russian Imperial Movement—is a white supremacist organization. Via the Treasury Department, the government lists thousands of entities as indicators of legal violations on prohibitions leveled against a smaller number of core organizations, like al-Qaeda, Hizballah, and the Islamic State. The European Union has named 21 terrorist groups subject to economic restrictions. The majority of those are violent Islamist groups and nationalist organizations. Collectively, the United States, EU, United Kingdom, Canada, Australia, and France have only listed 13 white supremacist groups under various counterterrorism laws.

Conversely, Meta lists thousands of organizations overall and hundreds of white supremacist organizations subject to its most stringent restrictions. For better or worse, and perhaps counterintuitively, the tech company has a far more aggressive policy approach toward white supremacist organizations than any western government. Meta’s approach to proscribing terrorist groups and hate organizations is far more comprehensive, globally speaking, than that of the U.S. government, European Union, United Kingdom, Australia, or New Zealand.

“For better or worse, and perhaps counterintuitively, the tech company [Meta] has a far more aggressive policy approach toward white supremacist organizations than any western government.”

Of course, comparing a social media company’s approach toward hate and political violence to a government’s is contentious. Governments and private actors have different responsibilities, authorities, and priorities that drive many of the differences. The U.S. government, for example, uses sanctions as targeted political tools to advance foreign policy objectives and is rightly restricted by the First Amendment from limiting groups based on their political outlook, however noxious. Private actors, meanwhile, might be expected to take more aggressive action than the government to protect their customers and user base. A bar overrun by Nazis will lose its regular customers; a website overrun by the same will as well.

Even if internet companies are sometimes more aggressive than governments at proscribing terrorist and hate groups, the historical record clearly indicates that they sometimes only act forcefully against such groups after significant legal and political coercion. Indeed, pressure from political leaders is partly what compelled social media companies to address ISIS. And many advocates of renewed regulation of social media do so because they believe that government influence should be codified. Yet some political actors have goals that are nefarious and partisan. Even if constitutional, strengthening the ability of government actors to compel companies on issues of speech creates real risk of abuse.

In the United States, §230 leaves defining “terrorism” to individual platforms (though many companies recognize legal obligations related to U.S. sanctions). Europe, however, is pulling some of this authority back toward governments. The Terrorism Content Online regulation grants administrative bodies in member states the authority to identify digital content that supports terrorism, and order that it be removed by platforms. Although companies do have the ability to appeal such orders, only a few companies are likely to have the legal capacity to file such appeals at scale and they will take months, if not years, to adjudicate. The regulation effectively grants national governments extraordinary latitude to order the removal of speech with minimal judicial oversight. Such a provision is unthinkable in the context of the First Amendment. In the United Kingdom, supporters of the Online Harms bill have removed provisions requiring companies to delete content deemed “legal but harmful,” a concept that was never fully defined.

Could the U.S. government compel companies to address groups that it has not legally proscribed? If not, might incentivizing companies to enforce only against legally sanctioned groups effectively disincentivize them from addressing others? Might that in some cases reduce overall efforts by technology companies to counter terrorism and hate given that some companies proscribe a much wider range of groups and materials than the government?

“But if writing rules for speech is incredibly difficult, dangerous, and error prone, figuring out how to enforce those rules at the scale of social media is even more so.”

These questions are difficult when it comes to terrorism, but they are even more perplexing in other policy arenas, such as misinformation and hate speech. While famously difficult to define, terrorism is traditionally predicated on real-world violence. This is easier to recognize than misinformation or speech that may be friendly in one context and hateful in another. But if writing rules for speech is incredibly difficult, dangerous, and error prone, figuring out how to enforce those rules at the scale of social media is even more so.

Identify and remove violations

Writing rules is hard; enforcing them is harder. The process known as “content moderation” has innumerable components, but it is useful to think about moderation as consisting of six basic components: human beings; detection; decision; investigations; record keeping, and quality control.

Content moderation systems begin with human beings. People must draft guidelines, write software, and make decisions. As a practical matter, hiring the people to manage globe-spanning content is often difficult for companies. A large platform like Meta must find employees who speak dozens of languages and hundreds of dialects. Those people are expected to make decisions about everything from terrorism to bullying and nudity, both recognizing local slang and social context while applying global rules independently of social and political mores in a particular location. There is sometimes tension between utilizing reviewers empowered with local knowledge and finding reviewers unencumbered by local biases.

Pre-pandemic, reviewers were generally expected to work in an office setting. This arrangement advanced data security and limited the sense of isolation that can develop among reviewers sometimes dealing with horrific material, but it also created challenges for hiring. It meant that the reviewers must be located somewhere the company or one of its contractors had an office. There are other challenges. People hired must actually be good at the job, which often requires complex human and technical systems to assess. Error-prone or politically biased reviewers have to be let go and replaced. Companies aim to have review teams working in multiple locations in order to have reliable 24-hour coverage. They must take into account vacations, sick leave, and the rest. Satisfying these needs is easier with common languages and dialects but more difficult for smaller language communities with minimal diasporas in places where tech companies and outsourcing companies have offices. And even in the best of conditions, well-intentioned, well-trained human beings make mistakes and regularly disagree with each other about how to apply content moderation rules.

Although there are exceptions, in general employees and contractors working for technology companies are not actively seeking out harmful content on their platforms. Harms are generally reported by users; referred by so-called “trusted flaggers” or specialized intelligence vendors working for companies; or identified using simple automation or via more sophisticated classifiers.

One of the most dispiriting early lessons I learned at Meta was that user reports, paradoxically, are both critically important and wildly unreliable. They proved to be a tremendously inefficient method of identifying terrorist or hate group material. So much anodyne material gets erroneously reported that spending time and effort reviewing user reports is almost a total waste of time—and certainly less valuable than reviewing reports from trusted flaggers or generated by various kinds of automation. Why are user reports so bad? Sometimes it seems malicious, other times just a matter of user error. And still other times reporting users are upset about something they see online that simply does not violate platform rules.

Whatever the reasons, the general unreliability of user reports forces platforms to balance the desire to limit violations on the platform and the need to respond to user complaints as a matter of customer service (not to mention the PR risk of ignoring a rare user report that does point to a serious violation). Customer service imperatives suggest that user reports should be prioritized, but the resources spent responding to frivolous complaints could almost certainly be better spent assessing material detected via other mechanisms.

Trusted flaggers, whether volunteers or paid vendors, are far better at identifying violating material. But such referrals will never reach the scale necessary to defend a large platform like Meta. The primary advantage of this method is that dedicated researchers can identify a trend or network on one platform and then follow it to another platform. Unlike the other detection methods described here, these approaches are fundamentally cross-platform, which is an important distinction and sometimes a critical one.

At the vast scale of the internet, automation is a critical tool for identifying potential harms. It is useful to think about automation on two levels: simple and complex. Simple automation matches static information to identify problematic content or patterns. This includes keyword searches, hash-matching, and various rule-based detection schemes. Sometimes these systems are extremely effective, especially when combined with intelligence collection and sharing. When I arrived at Facebook, we built a high-speed intelligence cycle to collect Islamic State propaganda, assess and classify it internally, and use hash-matching to find other versions posted to Facebook. The early system did not use any true artificial intelligence, but it was highly successful.

Complex automation, however, requires building sophisticated classifiers that not only match known bad content but also can assess novel material and determine the likelihood that it violates some predetermined rule. Using such tools to achieve policy ends is an art in itself—and in that way, social media companies are canaries in the coalmine for lawmakers and bureaucrats around the world who will increasingly need to both set policy constraining the use of AI and establish guidelines for implementing policy via AI.

With that in mind, it is important to understand AI’s limitations. Basic AI systems are trained on decisions made by human beings, so if those human beings systematically make a mistake, the AI will also. The record of human decisions is known as training data. In general, the more of it, the better. The more diverse it is, the better. To ensure that AI adapts along with changes in behavior by regular users and bad actors, platforms must continually update that training data with more human decisions. The AI will learn from those updated decisions, but that learning will lag behind dynamic adversaries that alter their behavior in response to AI-driven detection.

Social media companies often use AI for more than just determining whether to take a final action on a piece of content. For example, AI can prioritize which content humans should review first and cluster similar content to be reviewed together. It can also route content to a reviewer who can make sense of it—something that sounds easy until you are trying to find someone awake during daytime in the Western Hemisphere who can comprehend an essay about obscure religious issues written in a Bahasa-Arabic hybrid and drafted in a in partially transliterated Latin script.

Every AI system will make mistakes. There are two basic kinds of errors, which engineers often talk about in terms of “precision” and “recall.” Precision refers to false positives—the rate at which the content that an AI system identifies as violating is actually violating. Recall refers to false negatives—how much of the actually violating content an AI system will recognize as such. AI systems can be calibrated differently, but there is always a tradeoff between precision and recall. You can calibrate AI for higher recall, meaning you miss less of the bad stuff, but that inevitably means you reduce precision, meaning more false positives. Better AI reduces the size of the tradeoff—but there is always a tradeoff. Using AI means acknowledging that your system will generate errors.

Distrust of AI sometimes leads to calls for human review of all content moderation decisions. But as a practical matter, human reviewers also reliably make mistakes. The difference is that AI demands that policymakers explicitly accept error rates in exchange for scale, whereas human error is more often thought of as incidental. Regardless of this perception, it is no less inevitable. The advantage of human mistakes is that they generally make sense: perhaps someone misunderstood satire, overlooked key context, didn’t recognize a symbol, etc. AI mistakes can be inexplicable, and when they are, those mistakes are particularly difficult to diagnose and fix.

After content is detected, content moderation teams must make decisions about whether to take an action on that material. Sometimes these choices are made by extensive teams of human reviewers, who make decisions of varying complexity on content. But structuring those decisions is not straightforward. Should a single reviewer be able to delete something, or should a piece of content be reviewed multiple times before the decision is affirmed? Should that single human decision on one piece of content be scaled automatically across similar material posted elsewhere, or should all those pieces of content be reviewed independently? Is it better to organize review teams by language and train reviewers on all violations or is it better to have teams organized by subject matter? Should reviewers focus on the most acute harm types or address all types of harm equally?

The questions are just as complex when setting guidelines for AI. Is it appropriate to allow AI to remove content if you can expect a 10% false positive rate, which may mean wrongfully deleting millions of pieces of non-violating content? Deciding not to employ AI for such decisions might require hiring such a massive review team that it would be cost prohibitive for most companies and create both management and human error issues. The problem keeps getting harder: is it appropriate to remove content from recommendation features if you can expect a 30% false positive rate, meaning that nearly one third of the material you sanction is actually non-violating? (After all, you’ll still be minimizing the harm done by the 70% of material correctly identified as dangerous.) Should those rates change depending on circumstances in the real world, like an election or increased threats of violence? If so, by precisely how much?

At its most tactical, content moderation is a series of decisions about individual pieces of content or accounts. But a critical function of a broader trust and safety system is to keep track of those violations, both for direct operational purposes and for strategic analysis and transparency disclosures. Operationally, companies often decide to take stronger actions against accounts that repeatedly violate the companies content standards. This means determining how many “points” or “strikes” an account should receive for a particular violation and whether those penalties should expire over time or persist forever. Such systems may seem simple, but they grow complex given the number of users, the variety of potential violations, and the competing impulses to inform constructive users of their violations so they understand platform rules and to restrict information from unconstructive users that will use any information to better circumvent platform rules. For those users, record-keeping is often the only way a company can identify a terrorist propagandist returning to a platform using a new account.

Record-keeping is equally important for quality control and, increasingly, for public transparency. Companies assess the effectiveness of their violation discovery mechanisms, the accuracy of their reviewer decisions, and how to allocate resources to various markets. Companies also increasingly generate public transparency reports. These are both treasured and maligned by researchers and the media, most importantly because they only release summary statistics and usually describe the violations a platform finds and addresses and not those it misses. The most sophisticated transparency reports use extensive sampling techniques to estimate the overall rate of different kinds of violations on a platform, but such efforts are likely cost prohibitive for all but the wealthiest companies. Such mechanisms are also problematic for low-prevalence, high-severity harms like terrorist or hate group activity. This material is always a tiny sliver of the overall content on a platform like those owned by Meta, but it can have an outsized societal impact.

In order to address high-impact harms, like terrorist activity online, some platforms have developed sophisticated investigations teams. These teams are smaller, better-trained, and better-resourced than traditional content moderators. They conduct investigations of networks of accounts in high-severity situations—either to support referrals to law enforcement when violence seems imminent or as a mechanism to remove entire networks of violating accounts at one time. That tactic is designed to complicate the ability of a disabled user to recreate their digital network with a newly created account.

It should go without saying that these various defensive tactics should vary from platform to platform, just as an operational commander will prioritize certain systems and tactics in a more traditional real-world conflict. Regulators are not well-positioned to balance the value of these techniques on a single platform, let alone dictate such a balance across industry more generally.

Feature restrictions

Content moderation was historically a binary process. If content violated the rules, the platform took it down. At some threshold, platforms might disable a problematic account. But content moderation systems have grown more sophisticated over time. Now, platforms may take action outside that binary by restricting access to certain features for users or restricting content that poses a risk to the broader user base or society in general. Think of these steps as the equivalent of no-fly lists: content and accounts listed are not going to jail, but, as a result of behavior indicating they pose an increased risk, they lose access to privileges that, if abused, might facilitate serious harm.

“Incorporating increasingly nuanced actions into content moderation is a positive development, but it complicates the prospect of setting cross-platform policy guidance and is often confusing for users.”

Incorporating increasingly nuanced actions into content moderation is a positive development, but it complicates the prospect of setting cross-platform policy guidance and is often confusing for users. A binary decision framework—leave it up or remove it—is relatively transferable across different platforms; restricting features, including removal from recommendation engines, introducing context to misleading content, or limiting search options will vary based on the features of the platform where it is deployed. Nonetheless, non-binary actions are key to the content moderation toolkit—so it is worth developing some frameworks to think about these actions.

Generally speaking, platforms will restrict features in three broad scenarios and with three basic modes of restriction.

Behavioral oddities are perhaps the most common scenario that inspire a nuanced platform response. In this situation, a complex object, like an account, group, page, network or piece of content shows enough behavioral oddity (for example, a piece of content posted many times in a very short time period) that suggests it is inauthentic or does not reflect the purpose of the platform. Signals that content is low quality may also prompt a platform response. Sometimes called a “borderline” policy, this refers to circumstances when some element of the complex object or piece of content is problematic or low value but does not qualify to be removed from the platform. Finally, platforms sometimes take action when they have some signal that a policy violation has occurred, but that signal is ambiguous or unconfirmed. For example, a complex object or piece of content may be assessed by an automated classifier as likely to violate Terms of Service, but a human reviewer has not been able to confirm the violation. In some cases, restrictions in this final category may last only until the content has been deemed violating, at which time it will be deleted, or determined to be non-violating, at which time restrictions will be lifted.

The nature of various restrictions varies depending on the nature of the platform itself, but they can be grouped into three broad modalities. The first modes of restriction are designed to limit reach. This is the purpose of limits on ads, removal from recommendation surfaces and the like. A second set of restrictions aims to limit the ability of harmful actors to operate in secret. This might manifest as limiting an account’s ability to administer a private group or access to messaging products. Finally, platforms may introduce limits on a user’s ability to transmit information quickly. This manifests as restricting access to real-time features like live video that certain bad actors find very attractive.

It is useful to think through a few examples to illustrate the scenarios and modalities of non-binary restrictions. Meta introduced a range of reforms following the 2019 Christchurch terrorist attack, one of which was to limit access to certain features–notably live video–for users that had violated its Dangerous Organizations policy. The rationale for action was that the users in question had posted low-quality content, in the form of material that actually violated platform rules. That material was removed, but a secondary action was to limit the ability of the offending user to utilize tools that could be used to amplify the impact of a violent attack.

The effort was not unique. In the fall of 2021, Meta consolidated a broader strategy toward feature restrictions in its Content Distribution Guidelines, which describes how content moderation principles are integrated into Meta’s core ranking algorithm and Newsfeed. Most of these refer to spammy and low-quality content, material posted by users with Community Standards violations, and “borderline” content deemed to come close to a terms of service violation. Per Meta, this last category is designed to limit the distribution of material that is, “sensationalist or provocative and can bring down the overall quality of discourse on our platform.”

Meta also said that it would reduce distribution of content that “likely” violated its Community Standards. As noted above, AI does not produce binary outcomes. In this case, the AI used by Meta provides a score that reflects the likelihood that a particular piece of content does or does not violate platform rules. Let’s use notional numbers to illustrate the point: If the content is 99% likely to violate the rules, maybe it seems reasonable to trust the AI and just remove it completely via automation. That will result in false positives, and perhaps a large number of them if there is a lot of content, but perhaps the rate of false positives seems reasonable. But what if the classifier indicates the content is 90% likely to violate? Maybe (and this is a big assumption, especially for smaller firms) the company has enough reviewers to manually assess those and double-check the classifier. Good. What, then, should the company do with the content that is deemed 60% likely to violate (which, for the sake of this example, means that 40% of the content identified is not violating)? Should this material be automatically removed? Doing so would mean knowingly removing a large amount of non-violating, false positives from the platform. At this level of certainty, manually reviewing every piece of content may not be feasible. So, what do you do? Meta’s Content Distribution Guidelines indicate that such material may receive limited distribution, rather than outright removed, as a way of limiting the damage that the 60% that is violating can do while at least leaving up the 40% that is not. Of course, this means, de facto, that some totally non-violating content will have distribution restricted.

Applied at scale, non-binary enforcement actions fundamentally impact how digital products work. In doing so, they can profoundly affect a platform. Like all other tools, they are fundamentally and irreconcilably imperfect. They mean leaving some noxious content online (even if the harm that content does has been limited) and restricting some benign material.

It’s not clear what various formulations of §230 reform, including the Supreme Court’s pending decision in Gonzalez v. Google, might mean for these situations. Should platforms be held liable for material they restrict but do not remove? Does downranking such content mean that a platform is “aware” of the harmful content and thus should be held liable for it? Will Congress or the executive branch prescribe acceptable enforcement actions for different classifier confidence thresholds? Those thresholds are likely to vary wildly from platform to platform. If platforms can be held responsible for noxious content AI deems likely to violate, but that they do not remove, will companies simply stop classifying the content on their sites in order to limit that liability? If so, this could have the perverse effect of increasing the harm produced by such material. Or will platforms be incentivized to remove far more contentious political and religious speech, even if it does not call for violence? The bottom line is that if regulation does make platforms liable for decisions to restrict content or users based on ambiguous signals, Congress, agencies, or courts may need to adjudicate the AI thresholds companies use to inform such decisions. To date, there is little evidence that the legislature, the judiciary, or the executive branch are well positioned to make such judgements.

Responsibly implement legal and political restrictions on content

The First Amendment generally limits the ability of the United States government to force platforms to remove content, though these restrictions are relaxed when it comes to child sexual abuse material. Sanctions law may offer another exception.

The International Emergency Economic Powers Act (IEEPA) created the modern system of Treasury Department sanctions, many of which prohibit American entities from providing goods and services to sanctioned entities. IEEPA was updated in 1994 to include electronic communications. At the same time, a provision known as the Berman Amendment created carve-outs to allow most sanctioned entities to utilize American communications services, with the idea that media penetration into closed societies is a net positive. The Berman Amendment, however, excluded terrorism sanctions regimes from that communications carve-out. As a result, some social media companies have determined they are not allowed to knowingly allow entities designated by the U.S. government as Specially Designated Global Terrorists to operate on their platforms. This is a reasonable legal interpretation, but it has been applied inconsistently across social media companies. No platforms have faced any real repercussions from the government for this variation in practice, creating considerable ambiguity over what is ultimately required of companies.

A separate set of restrictions applies to entities designated as Foreign Terrorist Organizations (FTOs) by the State Department. It is a criminal offense to provide “material support” for a designated FTO, a prohibition that may include knowingly providing communications services. No social media company employee has been prosecuted for allowing such material, so the limits of the law are unclear. But many platforms simply do not want to be the test case.

The Supreme Court is currently considering two cases–Google v. Gonzalez and Twitter v. Taamneh–that will dramatically impact the legal risk companies face for terrorist material posted on their platforms. The central question in Gonzalez is whether 230’s protections apply when platforms recommend content to users; in Taamneh, it is whether a platform that attempts to remove terrorist content is nonetheless responsible for terrorist content it did not remove when it did not take every conceivable step to prevent it. The Taamneh case in particular offers some prospect for clarifying ambiguity around platform obligations regarding sanctioned parties, though the case itself is strange and such questions would be better addressed by the legislative branch.

In general, the obligations sanctions law produces are similar to an actor-based terms of service standard. They prohibit platforms from allowing certain entities to use a platform or from receiving direct support on it, such as fundraising. Notably, however, there is no clear legal argument preventing simple praise for sanctioned groups. That means, for example, that Meta’s prohibition against praise, substantive support, and representation of dangerous organizations likely extends beyond what is legally required regarding sanctioned entities.

For this reason, the recent Texas social media law, which prohibits “viewpoint discrimination,” is likely to benefit sanctioned groups like ISIS and Hamas. The law has a provision requiring the removal of “illegal” content, which presumably refers to material posted by or on behalf of a sanctioned group or that otherwise constitutes material support for such a group. But praise of a group—for example, “ISIS fighters are the bravest of all”—is disallowed by Meta’s community standards even though it is not likely prohibited by sanctions law. Blocking such a statement from appearing on Meta seems to constitute “viewpoint discrimination”—in this example, against ISIS supporters—and thus would be protected by the Texas law.

The Texas law is particularly dangerous because terrorists are not stupid and their behavior is not static. If the courts allow the Texas law to go into effect, extremists will amend their behavior online to reframe activities in terms of legal praise in order to gain nominal protection under the new law and complicate enforcement for platform defenders.

As of this writing, there is no clear requirement for social media companies to search proactively for sanctioned entities on their platforms. This may change with the forthcoming Supreme Court decision in Twitter v. Taamneh, which will assess whether platforms themselves can be held liable for inadvertently providing material support to sanctioned terrorists when those terrorists use digital platforms. Instead of waiting for the Court, Congress should clarify its intent regarding sanctions requirements on social media companies. An optimal solution would give the executive branch the ability to transparently employ or defer such provisions across terrorist regimes. The present ambiguity is unnecessary and conveys too much authority upon the court to settle ambiguous law. Clarity would simplify compliance for a wide range of digital platforms and facilitate consistency across platforms moving forward.

Responsibly engage law enforcement

In 2021, Meta removed more than 34 million pieces of content for praising, substantively supporting, or representing terrorist organizations. More than 20 million pieces of content came down for similar violations of hate organizations. These violations were not all equal. A few represent terrorist plotting; more indicate removing propaganda from prominent terrorist or hate groups. But others represent Pakistani and Kashmiri teenagers crushing on airbrushed images of deceased Hizbul Mujahideen fighter Burhan Wani; Algerian teenagers posting Osama bin Laden images as profile photos to quickly “delete” their account (and thereby any record of their digital flirtations); and praise for the sartorial choices of Proud Boys. A different category reflects campaigning by politicians linked to terrorist groups. In short, many violations of platform rules, even for terrorism, do not necessarily point directly to real-world violence. Companies remove this material because they lack the ability to distinguish banal and dangerous material at scale, and even banalities can normalize terrorism and hate, which may lead to recruitment. But, more often than not, Terms of Service violations have little direct link to real-world harm.

But direct, acute real-world harm is organized online. Many companies operate legal teams to receive, parse, and respond to warrants and other legal requests about such material. Companies intent on preventing violence may also make proactive referrals to law enforcement agencies when they identify material that portends real-world harm.

Broadly speaking, planning for real-world harm tends to occur in more private spaces than the bulk of terms of service violations: exclusive groups, threads on messaging tools, and platforms that attract more targeted audiences. This manifests in the form of problematic actors leaving mainstream platforms, shifting to more private spaces within mainstream platforms, and adopting encrypted services. The discrepancy between relatively private, high-severity content and relatively public, lower-severity content is profound. It means that public discussion of violent extremism often centers on the material that is most available and most embarrassing to platforms, not the material most closely tied to real-world violence and harm.

“Any change to §230 must account for the existing regulatory environment outside the United States.”

It is not clear how proposals to adjust §230 would deal with private communications, whether in groups or messaging platforms. But the EU has not balanced this tension transparently. The Terrorism Content Online regulation focuses on removing public material supporting terrorism, while the ePrivacy Directive limits the ability of platforms to scan more private surfaces for terrorist material. There are obvious privacy and security benefits from this posture, but they come at a cost. As a practical matter, the EU has incentivized companies to remove publicly posted terrorist propaganda but hindered their ability to identify privately posted terrorist plotting. Given the privacy and digital security benefits, this tradeoff may be appropriate, but it deserves far more scrutiny than it has received.

Any change to §230 must account for the existing regulatory environment outside the United States. Companies should not be held liable for material on surfaces they cannot legally scan for dangerous material. But, when it comes to terrorism and political violence, private messages are where the most concerning content manifests and extra-territorial restrictions prevent companies from scanning those surfaces.

Finally, §230 reform should not disincentivize companies from responding to warrants, respecting preservation requests, or proactively providing data to law enforcement when there is a credible and substantiated threat of harm. Since the entire purpose of such disclosures is to reveal the potential for real-world harm, some versions of §230 reform might look to hold companies liable for material so disclosed. This would be a dangerous mistake. Section 230 reform must not disincentivize good Samaritan referrals and proportionate disclosures in response to appropriate law enforcement referrals. Reform must not disincentivize companies from actively identifying and disclosing information to criminal investigators when the threat of real-world violence is clear; likewise, such safe harbor should be tailored not to incentivize excessive referrals and should require transparency disclosures to limit the risk of abuse by government agencies.

Share with industry

Many technology companies now collaborate in the Global Internet Forum to Counter Terrorism (GIFCT), an independent but industry-funded and overseen nongovernmental organization. The GIFCT, like similar groups focused on child safety, is an important experiment in self-governance by platforms. GIFCT’s best-known program is a database of hashes—essentially, digital fingerprints—of known terrorist content. Member companies provide the hashes for use by other members as they see fit.

Importantly, the engineering investment to integrate the GIFCT’s shared database with internal enforcement systems is not trivial. Getting value from shared hash databases like GIFCT’s requires significant engineering support inside companies where competition for resources is intense. During my time at Meta, this occasionally led to near-heroic efforts by individuals committed to the work. I gratefully remember a key engineer making critical adjustments to the GIFCT database (which Meta hosts) from her vacation in the Andes so that GIFCT could collectively respond to the Christchurch terrorist attack.

“Regulatory reform must not create liability for companies that voluntarily utilize shared industry resources or reveal that they have surfaced noxious material on their own platforms by contributing it to a shared database.”

GIFCT has matured a lot since those early days, but if such voluntary efforts are to flourish, it’s important to reward companies politically rather than subject them to liability for participating. Regulatory reform must not create liability for companies that voluntarily utilize shared industry resources or reveal that they have surfaced noxious material on their own platforms by contributing it to a shared database.

Critically, shared resources like GIFCT’s hash database are not panaceas. The National Center for Missing and Exploited Children (NCMEC) has maintained a hash-database of child sexual abuse material since 1998, but such material unfortunately has not been eliminated from the internet. Similarly, GIFCT will not eliminate terrorist material online, even if all companies become members. Shared systems are tools that offer companies relatively simple mechanisms to improve defenses against some of the world’s worst people. They will not eliminate these problems.

Educate users

Although the debate about §230 is focused on user generated content, digital platforms do sometimes use their own voices directly for trust and safety operations. This can take several different forms. Companies can, for example, provide basic information about their policies, their mechanisms to enforce them, and their success in doing so. They can also message users about content moderation decisions themselves. This may include warning messages or explanations about why a platform has removed or restricted certain content. Companies should provide their users such clarity (and various regulatory proposals require such messaging), though these messages do carry risk if they offer dangerous actors insight into how they might circumvent a platform’s rules in the future.

Platforms can also label some accounts to ensure that other users understand their biases. For example, Meta labels state media outlets. Coupled with overt limits on the distribution of content from these outlets, these labels represent direct editorial interventions by platforms.

Sometimes, platforms provide information directly. Meta, for example, links to fact-checking services when users share certain misinformation. In 2020, my team collaborated closely with colleagues working on child safety issues, researchers, and child safety organizations to develop messaging for users interested in QAnon efforts to recruit using appeals to child safety.

And finally, platforms may explicitly elevate a particular cause, such as voter registration or charitable opportunities. In rare cases, some have used their platforms for explicit political purposes.

We should not overstate the impact of these interventions. It’s not clear that platform fact-checking fundamentally changes how people perceive reality nor that counter-extremism interventions will have a significant long-term impact. Nonetheless, platforms should iterate on these efforts and work with researchers to improve them. They should not face liability for such experiments. The First Amendment almost certainly protects all of these interventions, but so does §230—and reform of that provision should protect the ability of platforms to engage users in this way.

It is important to note that the seven mechanisms above do not explicitly include supporting product development itself. In some ways, this is a shortcoming. Core product choices—encryption, the size of allowed groups, whether content should be public or private by default, etc—have a major impact on whether a particular platform is attractive to users, violent extremists or not. Trust and safety teams do consult on those product choices and, in optimal situations, on whether to launch risky products. As a general rule, trust and safety professionals should be integrated more fully into those decisions. At the same time, we should not imagine that there is a perfect platform design that will prevent all abuse. Every digital platform is susceptible to abuse, especially by small, motivated actors.

The CasaPound Problem

Dual-use laws are as dangerous as dual-use technology. Silicon Valley still suffers from the presumption that well-intentioned products created by well-intentioned engineers will be used only by well-intentioned users. This is false. So, too, is the idea that well-intentioned laws drafted by well-intentioned policymakers will only be used by well-intentioned citizens.

Hate groups have access to the judicial system just as they do the internet. Indeed, the relative caution of governments compared to technology platforms in designating hate and terrorist groups means that many entities banned by platforms are legal. Broadly speaking, this is appropriate, but it means that such hate groups may have standing to use litigation to pressure platforms. Numerous cases have been filed in the United States challenging platform policy decisions, including by prominent white supremacists.

It’s hard to know how these suits would be resolved absent §230, and the passage and judicial disputes over the Texas and Florida social media laws—along with Gonzalez and Taamnehcreates even more ambiguity. But judges often do cite §230 in dismissing such cases. Conversely, a case in Italy illustrates how radical groups might pressure platforms absent the protections that the First Amendment and §230 afford.

CasaPound is an Italian neo-fascist, Mussolini-celebrating, anti-immigrant organization that champions key ideological figures of the modern, global neo-fascist movement, such as the 20th century thinker Julius Evola. The group has been linked to various attacks against immigrants and Roma, but it has also sponsored candidates for office at the local level. Despite Italian constitutional prohibitions regarding fascism, “CasaPound Italia” remains legal in Italy.

A leaked version of Meta’s “Dangerous Individuals and Organizations” list indicates the social media giant determined that CasaPound is proscribed on Meta platforms. Nonetheless, CasaPound’s Facebook page is operational today. Why? The reason is not an enforcement error by Facebook, but a protracted legal battle waged by CasaPound against the platform. Per the highly useful summary produced by the Global Freedom of Expression program at Columbia University, CasaPound’s Facebook page was removed on September 19, 2019, along with an administrator of the page. CasaPound sued in Italian court, arguing that it was a political party and that Facebook had restricted its ability to “contribute by democratic means to national policy.” In a preliminary injunction, the court both ordered Meta to reinstate the page and, critically, pay an 800 Euro penalty directly to CasaPound for every day the page remained unpublished.

Meta appealed the original injunction, arguing that, as a private actor, it does not have the public responsibility to protect CasaPound’s ability to participate in politics and that the contractual relationship between Meta and CasaPound clearly states that the platform has the authority to remove the page for Community Standards violations. The court was not convinced by either argument, deciding on April 29, 2020 that Meta must reinstate the page and pay CasaPound 12,000 Euros. The page is now reinstated. To my knowledge the fine has not been paid, in lieu of continued legal process about the scope of the decision, including whether the CasaPound page must be reinstated globally or only in Italy.

The CasaPound case is not the only instance where extremist entities have used national courts to challenge social media bans. In India, the Hindu nationalist group Sanatan Sanstha sued Meta after the company removed its core Facebook pages. An Indian court ultimately determined in 2021 that Meta had the authority to do so based on the original contract. In Germany, Heike Themel, a politician from the far-right AfD party, sued Meta after the platform removed her response to a comment calling her a “Nazi slut,” in which she suggested she might use weapons to resolve the dispute. Themel’s account was suspended for 30 days, but a regional German court determined that Meta’s Community Standards were not a sufficient rationale to remove content if the material did not meet the German legal definition of hate speech.

These cases do not provide anything like a clear global precedent. Sanatan Sanstha is a fringe group even within the larger Hindu nationalist movement, and it is not clear how courts would adjudicate a similar case involving a more politically-connected group. German courts have not spoken with one voice on the ability of platforms to remove content based on their own policies. But the wider German approach to digital harms is, effectively, to deputize platforms by enforcing German hate speech laws and limiting their authority to exercise independent judgment.

The CasaPound case illustrates that noxious actors will utilize the law to compel platforms if given the opportunity. That CasaPound case, and others like it, also illustrates that—contrary to received wisdom—platform hate speech and hate organization rules are often more aggressive than local laws. This is unsurprising: plenty of physical businesses would expel customers for loudly spouting hate speech and violent threats, even if that speech is technically legal. At a base level, it is a matter of creating a welcoming environment for the broader clientele.

“Generally speaking, individuals and organizations subject to platform removals are unlikely to win court challenges because of First Amendment protections for private actors, including platforms.”

Given the First Amendment, it is unlikely that the reasoning in any of these decisions would apply in the United States, even if §230 were discarded completely. It is worth noting, however, that the Texas prohibition on viewpoint discrimination would presumably apply even if the “viewpoint” of the speaker is that of a Nazi. Nonetheless, eliminating §230 protections would incentivize platforms to align their rules and enforcement posture with positions easily defensible in court. In the United States, that may mean lowering, not raising, policy standards and enforcement criteria around terrorism, hate, and incitement to violence.

Generally speaking, individuals and organizations subject to platform removals are unlikely to win court challenges because of First Amendment protections for private actors, including platforms. Yet in the absence of §230, even failed lawsuits would take longer to adjudicate and could be prohibitively expensive for small platforms to manage. In some settings, such lawsuits might succeed, causing further problems for platforms. The end result is likely to be more hate online, not less.

Conclusion

More oversight of the modern internet is necessary. But, reform is no panacea and could exacerbate some problems. Adjustments to §230 that compel platforms to focus their enforcement solely on illegal speech will likely lead to more severe harms, including hate speech, celebration of terrorism, and incitement to violence. Creating liability regimes around proscribed organizations, like those on the FTO list, will likely also induce platforms to redirect resources away from managing domestic extremists—including white supremacist groups—and toward the various violent Islamist groups that dominate the FTO list. At the same time, reforms that aim to restrict legal speech by increasing civil liability for enforcement decisions pose real First Amendment challenges and risk exploitation by extremist groups. At the very least, we should expect that wealthy advocates will use their largesse to apply legal pressure on platforms in a manner similar to billionaire Peter Thiel’s campaign to fund litigation against the publication Gawker, which ultimately led to the shuttering of the website.

“More oversight of the modern internet is necessary. But, reform is no panacea and could exacerbate some problems.”

Highlighting these risks may seem counterintuitive. Silicon Valley companies were woefully slow to mitigate the risks associated with managing massive platforms. They were freighted with a naive optimism in the power of connectivity, market pressure to innovate quickly, and the need to allocate resources to profit centers. However, market demands from consumers and government urging did eventually spur action by these companies, and now the largest platforms have extensive programs to both root out noxious material and provide the most serious cases to law enforcement. The overwhelming majority of what many companies remove, including on Meta platforms, goes beyond any prospective legal requirement.

Calls for increased regulation often rest on an overly rosy picture of the historical internet, which is juxtaposed with today’s more problematic version. It is fair to point to uniquely problematic elements of the modern internet, including social media, but these arguments too often minimize the long history of extremism online. Digital extremism is not simply a function of engagement-based advertising or so-called surveillance capitalism. Extremists adapt to a wide range of online features—and have for more than 40 years. Noting this reality does not minimize the reality that modern communications empower fringe extremists, nor does it preclude other reasonable critiques of modern digital business models.

But I fear that the focus on such features is driven by political analysis rather than focused assessment of violent extremism itself. In an effort to motivate political will for regulatory reform, tech critics have emphasized a theory of extremist activism online that centers the structure of the platforms themselves. This approach is problematic for a variety of reasons, but several stand out. First, the most rigorous studies do not substantiate the idea that recommendations and ads are the primary drivers of extremism, either online or in the real world. Reforming these practices is therefore unlikely to address the increased embrace of anti-democratic movements and political violence in American society, even as political debate on these issues has crowded discussion of more fundamental questions.

Second, ascribing the rise of extremism online to a limited set of features creates a moral hazard for innovators building platforms that do not rely on those features. Silicon Valley, and the activists that pressure it, must recognize the inevitability that bad actors will attempt to abuse all digital platforms—and build appropriate defenses as a result. Trust and safety professionals across industry certainly know this. I fear that the motivated reasoning and political logic that led activists to focus on a specific set of features as driving most harm has inadvertently given technologists not steeped in digital harms a way to elide responsibility—so long as they avoid that limited feature set.

And, finally, technology evolves and extremists adapt. Policy and advocacy built around specific features ignores the inevitability that extremist actors will adapt. Tactics must shift in any adversarial engagement, but feature-focused regulatory demands will be too slow in an environment where extremists shift behavior daily.

Part of the problem is that the community studying extremism online remains overly bifurcated. An older community of terrorism researchers has long examined how extremists use the internet, but this group still does not adequately engage with communications researchers studying platforms themselves. The former tend to find motive in the individuals and organizations themselves; the latter in the incentives produced by platform design. A younger generation of scholars and researchers bridge this gap, but it is telling that the useful and widely-cited bibliography of “Social Media and Political Dysfunction” produced by Haidt and Bail does not cite even contemporary extremism and terrorism journals, let alone older studies of extremist activity online.

Relatedly, researchers and the media increasingly conflate disinformation (actively creating falsehoods, including by misrepresenting a speaker or information source), misinformation (false or inaccurate information), and violent extremism. This matters because these problems often manifest very differently online, with disinformation and misinformation generally manifesting as less acute harms taking place more often and violent extremism as more acute, lower-prevalence harms. A researcher who studies only one aspect, harm or prevalence, can dramatically distort public understanding. For example, an examination of the violence on January 6 that examines only public disinformation about the 2020 election, but not the organized plotting in encrypted messaging platforms by the Proud Boys and Oath Keepers (and vice versa), is misguided.

Principles for policymaking

Platforms can and should be better than they are, but they cannot be perfect. Content moderation, whether conducted by human beings, classifiers, or some combination of the two, necessarily demands difficult trade-offs made with incomplete and ambiguous information. Any system will include false positives and false negatives. Using classifiers is necessary at scale, yet they are wildly imperfect. As a result, §230 reforms that suggest strict platform liability for individual posts are unworkable. There is simply so much dangerous material that any reasonable threshold will be reached. Lawmakers might try to specify some percentage of content that could acceptably slip through platform defenses, but this would be extremely difficult to measure and is probably politically impossible. The prospect of adjudicating such a principle across platforms is nightmare-inducing.

“Platforms have built systems to mitigate the risk of real-world harm by using imperfect calculations of the probability that content violates (or comes close to violating) platform rules to impact distribution and availability of such material, including through features like recommendation systems.”

The development of content moderation tools outside the binary of leaving up or taking down content has intertwined §230’s two conceptually distinct components—first, immunizing platforms for liability that results from content posted by their users; second, stating that platforms may not be held liable for the repercussions of their content moderation decisions. Platforms have built systems to mitigate the risk of real-world harm by using imperfect calculations of the probability that content violates (or comes close to violating) platform rules to impact distribution and availability of such material, including through features like recommendation systems. Such efforts lean on the elements of §230 that protect content moderation, but also wind up leaving harmful content on the platform and so embrace the first as well. That means that removing liability protections in §230’s first clause cannot be separated cleanly from the content moderation protection embedded in the second.

Because of the ubiquity of classifiers running across digital content, eliminating §230 would likely require regulators to address the question of when those platforms came to “know” that material on their platform was harmful. When the classifier indicates the material is 80% likely to be hate speech? Fifty percent? The epistemological question is daunting, let alone the operational one—especially when you consider that platforms all use different classifiers that are not directly comparable. Even indications that a classifier suggests content is 80% likely to violate are only estimates and a classifier score indicating such an outcome cannot be easily compared to a separate classifier operating on a different platform. Regulating such conditions is likely to fail.

So, if making fundamental changes to §230 is unwise, what should regulators do?

Clarify rules about acute threats and sanctions

Platforms play a critical, but often an unobserved role identifying and addressing real-world threats. Reform must actively avoid disincentivizing such efforts, but should instead require that they be more transparent, partly as a measure to prevent abuse of such mechanisms. To achieve that end, regulators should protect good faith disclosures to law enforcement from liability; reduce ambiguity in the Stored Communication Act and the Electronic Communication Privacy Act (18 U.S. Code §2702 comes to mind) about when platforms may proactively provide user data to law enforcement; and require general public transparency about such disclosures. The U.S. federal government should also explicitly support platforms that refuse to provide information to law enforcement requests—international or domestic—that are overly broad or poorly predicated.

Likewise, Congress–not the Supreme Court–should clarify the degree to which sanctions targeting terrorists create criminal or civil liability for companies. In general, companies should not be liable for single pieces of content, even awful ones. They should be incentivized to establish robust systems to identify and remove material. Congress should delegate power to the State Department and Treasury Department the ability to publicly and transparently pause or clarify restrictions on terrorist speech online—for example, during peace processes, such as when former leaders of the Revolutionary Armed Forces of Colombia (FARC) ran for office in that country before the U.S. delisted FARC as a Foreign Terrorist Organization.

System, not anecdote

In general, regulation should not create strict liability for moderation missteps, but instead should require companies to demonstrate they are responsible stewards of digital communication systems. Structured correctly, such an approach would require transparency and incentivize significant compliance efforts by platforms. In broad terms, this is how the European Union’s Digital Services Act aims to work. (I’m not endorsing all of the Digital Services Act, which defers so many key decisions to bureaucratic regulators that it is difficult to assess.)

Of course, this will be easier said than done. To be effective, a regulatory regime designed to assess platform defenses will require detailed disclosures from industry and a major effort to set linguistic and conceptual standards for industry. Regulators might begin by requiring disclosures from companies regarding the seven categories of trust and safety efforts described above: how companies define harm; how they identify and remove violations; what sort of feature limitations they utilize and under what circumstances; what measure they have to responsibly implement legal restrictions on content; the mechanisms they have to engage law enforcement; how they engage law enforcement; what sort of engagement they have with industry bodies like NCMEC and GIFCT; and how they communicate with users, both about strategic risks and related to specific incidents.

The problem with such an approach is that it risks regulatory capture by industry and likely favors large companies with already-robust trust and safety mechanisms. There is a range of evidence from Europe that regulatory efforts already impede smaller, innovative companies more than larger established ones. Despite risks to the competitive environment, trust and safety focused regulation should not exclude small companies for the simple reason that a lot of real-world harm is advanced on them. Technology reporters and activists have focused on the misinformation on Facebook and other large platforms that contributed to the January 6 protest and violence. But indictments of members of the Proud Boys and Oath Keepers illustrate the central role of Telegram, Signal, thedonald.win, Zello, and Parler to the concrete operational planning of individuals allegedly planning in advance to attack the Capitol. Despite getting relatively little media attention, this dynamic is not exceptional and it will only become more common in the future.

Transparency built around surfaces, not market capitalization

There are ways to structure systemic oversight mechanisms without tipping the scales too much toward larger, more established companies. Some regulatory proposals do this by adjusting requirements based on user numbers, market capitalization, or revenue. None of these metrics adequately approximate risk. A better approach is to condition requirements on the complexity of the platform itself. Regulatory disclosures should be indexed on the number of “sub-applications” and “surfaces” that a platform must defend. On Twitter, the sub-applications might include the core feed and Twitter Spaces, among others. The Facebook platform has many sub-applications, including Newsfeed, Marketplace, and Watch. Each of those sub-applications includes multiple surfaces. On Facebook’s Newsfeed, that includes posts, comments, and profile-level features like the About section. Essentially, a surface is any discrete space where a user can create content.

For general users of the internet, sub-applications and surfaces are part of integrated product experiences. But to the engineers building such systems, and the people that must defend them, these surfaces are fractured. They operate on different backend systems, which means that a defensive mechanism deployed on one surface may not work well, or at all, on another. In traditional military analysis, defenders are often thought to have “interior lines,” which provide a logistical advantage over attackers using “exterior lines.” In digital space, the situation is often inverted: defenders must contend with fractured terrain—sub-applications and surfaces—that confounds resource reallocation and makes it difficult to smoothly integrate one defensive system with another. Meanwhile, attackers flit across the consumer-friendly elements of the web to shift, probe, and iterate.

Attrition is not a particularly meaningful concept in these digital fights. Except in the most extreme cases, attackers face only minimal costs for failure, though increased defenses can reduce the value proposition of conducting operations. Structuring regulatory requirements—even if just on the level of transparency—by surface will incentivize companies to invest in core systems that actually scale across the digital terrain that must be defended.

This approach would be challenging for companies to implement, but it would have several major benefits. First, it reflects risk. Every sub-application and digital surface is a potential forum for abuse, so companies should describe their efforts to defend every surface. Otherwise, companies can simply describe their most state-of-the-art mechanisms, even if they are only applied on a narrow subset of the overall surfaces. This is akin to French officers prior to World War II celebrating France’s defenses based solely on the strength of the Maginot Line.

Second, the surface-based approach scales naturally. Smaller platforms would only need to describe and explain their processes on less-complex products. Larger platforms will need to invest significant resources into describing defenses on various surfaces, but once they have established baseline processes for such reporting, disclosures will be vastly simplified.

Third, senior leaders at some companies probably do not fully understand the operational gaps in their defenses. These reports should be blueprints to highlight those limitations and in doing so give executives a blueprint for trust and safety investments.

Fourth, a surface-based regulatory approach would incentivize companies to both build new products thoughtfully and invest in integrated, scalable trust and safety infrastructure. Reporting demands and potential sanctions for failing to defend surfaces will incentivize companies to centralize and integrate trust and safety infrastructure into their product choices. Similarly, companies will have a strong incentive to ensure new surfaces are built with trust and safety features in mind from launch.

Trust and safety does not directly generate revenue, though there is significant evidence that users who feel a platform is safe are more likely to remain users. The goal of trust and safety regulation should not be to punish companies for commercial success by piling on new requirements when the platform gains users. Rather, it should be to encourage companies to build scalable systems to mitigate the risks that manifest on their products. A surface-based approach is the most direct mechanism for doing so.

This approach does have risks. It would require building a sophisticated, costly regulatory infrastructure. It will mean defining concepts like “surface” and “sub-application” far more granularly than I have here. It will not prevent all harm. It will require innovators to slow down until they build flexible, scalable trust and safety infrastructure. This is, to some extent, the point—but it is also potentially costly in an environment where technical innovation matters for both economic and direct national security purposes. It is one of the tradeoffs policymakers must weigh.

“Reform should focus on clarifying currently ambiguous requirements, demanding transparency, and incentivizing companies to innovate more responsibly.”

Adjusting technology regulation, especially §230, is risky. Proposals to make platforms directly liable for any noxious activity on their platform or, for content moderation efforts that run afoul of a well-heeled hate group (or even just a litigious billionaire) may backfire. Reform should focus on clarifying currently ambiguous requirements, demanding transparency, and incentivizing companies to innovate more responsibly. No one should be under any misapprehension that such measures will prevent dangers online. But neither will more dramatic §230 reform. This is a fundamentally adversarial space; dangerous actors online can be disrupted and, to some extent, deterred. Unless they are defeated in the physical world, however, they will not be eliminated.

Authors