How to deal with AI-enabled disinformation

FILE PHOTO: A supporter of U.S. President Donald Trump wears a 'Make America Great Again' cap during the 2020 U.S. presidential election, in Miami, Florida, U.S., November 4, 2020. REUTERS/Marco Bello/File Photo
Editor's note:

This paper was originally published as part of a report jointly produced by Brookings and the Italian Institute for International Political Studies (ISPI), entitled “AI in the Age of Cyber-Disorder.” This report is also part of “AI Governance,” a series from The Brookings Institution’s Artificial Intelligence and Emerging Technology (AIET) Initiative that identifies key governance and norm issues related to AI and proposes policy remedies to address the complex challenges associated with emerging technologies.

Rapid disinformation attacks—i.e., attacks in which disinformation is unleashed quickly and broadly with the goal of creating an immediate disruptive effect—are one of the most significant challenges in the digital ecosystem. Consider the following hypothetical: On the morning of Election Day in a closely contested U.S. presidential election, supporters of one candidate launch a disinformation campaign aimed at suppressing the votes in favor of the opposing candidate in a key swing state. After identifying precincts in the state where the majority of voters are likely to vote for the opponent, the authors of the disinformation attack unleash a sophisticated social media campaign to spread what appears to be first-person accounts of people who went to polling places in those precincts and found them closed.

The attackers have done their homework. For the past several months, they have laid the groundwork, creating large numbers of fake but realistic-looking accounts on Facebook and Twitter. They mobilize those accounts to regularly post and comment on articles covering local and national politics. The attackers used artificial intelligence (AI) to construct realistic photographs and profiles of account owners to vary the content and wording of their postings, thereby avoiding the sort of replication likely to trigger detection by software designed to identify false accounts. The attackers have also built up a significant base of followers, both by having some of the attacker-controlled accounts follow other attacker-controlled accounts and by ensuring that the attacker-controlled accounts follow accounts of real people, many of whom follow them in return.

Just after polls open on the morning of Election Day, the attackers swing into action, publishing dozens of Facebook and Twitter posts complaining about showing up at polling locations in the targeted precincts and finding them closed. A typical tweet, sent shortly after the polls opened in the morning, reads “I went to my polling place this morning to vote and it was CLOSED! A sign on the door instructed me to vote instead at a different location!” Dozens of other attacker-controlled accounts “like” the tweet and respond with similar stories of being locked out of polling places. Other tweets and Facebook posts from the attackers include photographs of what appears to be closed polling stations.

Many legitimate accounts also inadvertently contribute to propagating the disinformation, as people who are unaware the accounts are fake reply to and comment on the disinformation posts. This spurs additional propagation from their followers. The attackers are careful to originate the disinformation from most but not all of the attacker-controlled accounts; the remainder of their accounts are used to propagate it through replies and likes. The attackers know that later on in the day, once the social media companies realize what is happening and take action, this will make it harder to separate the accounts intentionally participating in the disinformation campaign from those doing so unwittingly.

Local television and radio stations quickly pick up the story, providing initial on-air and online coverage of the reported closures. Some but not all of the stations are careful to note the claims have yet to be verified. A few national news chains begin echoing the story with the caveat that they are still awaiting verification. Within 30 minutes of the first social media postings, local reporters arrive on scene at several of the polling locations and find there are no closures. The polls are open, voting is going smoothly, and the people waiting in line to vote express puzzlement when told about the social media claims. The local media and national quickly update their coverage to explain that assertions of closed polling places are false.

But the damage is already done. For the remainder of the day, rumors of closed polls continue to propagate through the social media ecosystem. Many voters in the precincts involved hear only the initial reports of closed polling places and not the follow-up stories declaring those reports false. For some of them, the resulting uncertainty is enough that they decide not to vote. Many others decide to wait until later in the day to vote under the assumption that more time will bring more clarity. This creates a flood of people arriving at polling stations in the mid and late afternoon, resulting in lines with waits that rapidly grow to over an hour. Some people, unwilling or unable to wait that long, go home without voting. In the aggregate, the disinformation attack leads to tens of thousands of lost votes across the state—enough, as it turns out, to change the election outcome at both the state and national level.

The Risks of Disinformation

Hopefully, the scenario outlined above will never happen. But the fact that it could occur illustrates an important aspect of online disinformation that has not received as much attention as it deserves. Some forms of disinformation can do their damage in hours or even minutes. This kind of disinformation is easy to debunk given enough time, but extremely difficult to do so quickly enough to prevent it from inflicting damage.

Elections are one example of the many domains where this can occur. Financial markets, which can be subject to short-term manipulation, are another example. Foreign affairs could be affected as rumors spread quickly around the world through digital platforms. Social movements can also be targeted through dissemination of false information designed to spur action or reaction among either supporters or opponents of a cause.

“Some forms of disinformation can do their damage in hours or even minutes.”

Of course, the problems posed by online disinformation designed for short-term impact are not new. In financial markets, the online message boards in the early days of the internet were commonly exploited by people seeking to sow and then rapidly trade on false information about the performance of publicly traded companies. What has changed is the sophistication of the tools that can be used to launch disinformation campaigns and the reach of the platforms used for dissemination. In the late 1990s, unscrupulous traders on financial message boards would need to manually author false rumors and hope they reached a large enough group of traders to move the market. Today, the power of AI can be deployed as a force multiplier, allowing a small group of people to create the level of online activity of a much larger group.

Detecting Disinformation

Disinformation in all of its forms is one of the most vexing challenges facing social media companies. The same false positive/false negative tradeoff that applies in many other domains applies to disinformation detection as well. If social media companies are too expansive in what they classify as disinformation, they risk silencing users who post accurate information about important, timely developments. If companies are too narrow in their classifications, disinformation attacks can go undetected.

“Disinformation in all of its forms is one of the most vexing challenges facing social media companies.”

Social media companies are well aware of this tradeoff. For disinformation campaigns designed to act over longer periods of time, in many cases the best approach for social media companies is to act conservatively with regard to blocking content; the harms of waiting to confirm the falsity of information before blocking it are often lower than the harms of inadvertently blocking posts by legitimate users conveying accurate information. Put another way, with disinformation that would inflict most of its damage over a longer time scale, social media companies have the latitude to take the time needed to investigate the accuracy of suspected disinformation posts while still retaining the option, if needed, to act early enough to preempt most of the damage.

Rapid disinformation attacks are particularly hard to address, as they don’t leave social media companies the luxury of time. Consider the Election Day scenario presented above. If a social media company waits several hours before concluding that the reports of closed polling places are false before they take the action of shutting down the attackers’ accounts, the damage will have already been done. By contrast, taking action within minutes could preempt the damage, but that would require a confidence level and a knowledge of the accounts behind the attack that could be nearly impossible to obtain in that short of a timeline. Even if confidence in the falsity of the information could be quickly obtained, there would still be the question of which accounts to block. This is especially true if, as in the election day scenario, the attack is constructed in a manner to cause legitimate accounts to unwittingly contribute to propagating the disinformation.

For unsophisticated disinformation campaigns, such as those involving a flood of copy-and-paste posts from newly created accounts with few followers, it is a straightforward matter for detection and mitigation software to respond rapidly. Sophisticated attacks like the one described above, however, act and present similarly to legitimate account activity. The time necessary to disentangle what is true and what is not true, and to identify which accounts are acting in good faith and which are not, is far larger than the time over which the disinformation can inflict its most significant damage.

Fortunately, the need to combat online disinformation has received increasing attention among academic researchers, civil society groups, and in the commercial sector, specifically among both startups and established technology companies. This has led to a growing number of paid products and free online resources to track disinformation. Part of the solution involves bot detection, as bots are often used to spread disinformation. But the overlap is not complete—bots are also used for many other purposes as well, some nefarious and some innocuous; and not all disinformation campaigns involve bots. One simple and easily accessible illustration is the set of tools provided by the Observatory on Social Media at Indiana University. One of the tools, Botometer, “checks the activity of a Twitter account and gives it a score. Higher scores mean more bot-like activity.” There are also a growing number of commercial products aimed at detecting and managing bots.

Bots alone are only part of the problem, as not all disinformation campaigns that use bots will be picked up by bot detection software. It is therefore also important to have tools that can look at how suspect content is impacting the broader ecosystem. Another of the tools from Indiana University’s Observatory on Social Media, Hoaxy, can be used to “observe how unverified stories and the fact checking of those stories spread on public social media.” Hoaxy tracks online activity relating to stories and fact checks by third parties of those stories. As useful as Hoaxy is, it does not attempt or purport to draw its own conclusions about the accuracy of a story. Rather, it simply gathers information about what other sources have said about the accuracy of a story, without exploring the extent to which those sources may themselves be accurate. The upstream problem—and the one that is ultimately far more difficult to resolve—is to establish whether an online claim is true or false.

The Challenge of Data Labels

Responding at sufficiently fast time scales to rapid disinformation attacks will require AI. But AI isn’t magic; for it to be effective in addressing disinformation, it needs access to data as well as to information enabling it to evaluate data accuracy. To explore this further, it is helpful to first consider how AI-based approaches can be used to detect disinformation in the absence of any time pressure, and then to address the additional complexities that arise with the need for fast detection.

Disinformation is easiest to detect when there are large sets of “training data” that have been accurately labeled. Training data is used to enable an AI system to learn, so that it when it sees new data that wasn’t in the training set, it knows how to classify it. Consider a drug that has been scientifically proven to be ineffective for curing COVID, but that many social media users and some news sites nonetheless continue to claim is a cure. A training data set can be constructed by 1) compiling and labeling as false a large number of social media posts that incorrectly assert that the drug cures COVID, and 2) compiling and labeling as true a large number of social media posts and news stories that correctly assert that the drug does not cure the illness. A machine learning algorithm can then learn using this training set. This corresponds to “supervised” learning; i.e., learning using a data set that has already been labeled regarding the attribute of interest. Once the training process has been completed, the algorithm will be highly effective at rapidly classifying new social media posts or news stories regarding this drug as either inaccurate or accurate.

There is also a class of machine learning techniques based on “unsupervised” learning, in which the algorithm must learn to identify categories of interest in the data without the benefit of pre-existing labels. An example of unsupervised learning in the context of disinformation can be found in a 2019 paper titled “Unsupervised Fake News Detection on Social Media: A Generative Approach” and published by researchers at Shanghai Jiao Tong University, Penn State University, and Arizona State University. The authors mathematically analyze “users’ engagements on social media to identify their opinions towards the authenticity of news” and use that as a basis to infer “users’ credibility without any labelled data.”

One hurdle to any learning algorithm, whether supervised or unsupervised, is access to a sufficiently large set of training data. Information suitable for use as training data regarding a particular issue or question can take significant time to accumulate on social media. To be useful in AI systems for detecting disinformation, the data would in many (though not all) instances require at least some degree of manual coding at the outset. Such an approach works if the topic at issue is one—such as false claims regarding medical cures—for which the time over which it is important to continue to combat the disinformation is much longer than the time it would take to build and use a large set of training data. But it is far less effective for situations when disinformation defenses need to be deployed very quickly, and in which there will typically be a smaller amount of data that can be used as a basis for the algorithm to learn.

Another issue, both in rapid and in less time-constrained attempts to identify disinformation, is the accuracy of the data labels on which an AI algorithm is relying during the learning process. “Noisy” data—data in which the labels are not necessarily accurate—is a well-known problem in machine learning. To take a simple example unrelated to disinformation attacks, consider a machine learning algorithm that is attempting to learn to automatically distinguish images of cars from images of bicycles. To do this, the algorithm might scour the internet and find millions of images that are labeled “car” and millions of other images labeled “bicycle.” In most cases, those labels will be correct. But in some instances, the labels will be incorrect; an image labeled “car” might show a truck, a bicycle, or content completely unrelated to vehicles. The higher the fraction of incorrect labels, the more difficult and slower it will be for the algorithm to learn to accurately distinguish between cars and bicycles. Working with noisy data is an active area of research, and there are emerging approaches that can help mitigate—though not completely eliminate—the loss in accuracy that results when a machine learning algorithm learns based data in which there a is substantive fraction of labeling inaccuracies.

Attempts to use AI to identify disinformation will likely need to confront noisy data for the simple reason that intentional deception is involved. Most people who post an image of a car to the internet wouldn’t choose to label it “bicycle” just to throw off machine learning algorithms. But disinformation attacks will be associated with a set of conflicting claims about whether online statements are true. Returning to the Election Day example from above, in response to a tweet falsely stating that a polling location is closed, someone who has actually just voted at that location might reply with a tweet stating that the initial tweet is false and that the polling location was in fact open. That reply is, in effect, a label. An account controlled by the attackers might also reply to the initial tweet by asserting that it is true. That reply is also a label, though one that directly contradicts the reply from the real voter. Over short time scales, it would be exceedingly difficult for an algorithm—or a human—to know which label to trust. Responding quickly to disinformation thus requires addressing the twin hurdles of limited data and unreliable—and in some cases, intentionally wrong—labels of that data.

“Responding quickly to disinformation thus requires addressing the twin hurdles of limited data and unreliable—and in some cases, intentionally wrong—labels of that data.”

Researchers have recognized these issues and are developing new approaches that do not rely on a large set of pre-existing training data. In April 2020, a team of researchers from Microsoft and Arizona State University posted a pre-publication version of a paper describing new results on techniques for quickly detecting fake news. In the paper the authors note that traditional approaches to detecting fake news “rely on large amounts of labeled instances to train supervised models. Such large labeled training data is difficult to obtain in the early phase of fake news detection.” To address this, the authors introduce a method that requires only a “small amount of manually-annotated clean data,” which can be used to rapidly and automatically label a larger set of data based on posts and comments on news articles by social media users. User credibility is one of the factors considered in forming the labels. According to the authors, this approach “outperforms state-of-the-art baselines for early detection of fake news.” Frameworks like this can not only help solve the problem of limited data, but could potentially also help mitigate labeling accuracy issues.

As the above examples help to convey, one common theme in research addressing disinformation is the importance of measuring the credibility of online sources. Approaches to establish and then leverage credibility will be critical to quickly identifying truth in the presence of a well-constructed rapid disinformation attack. For instance, in the Election Day scenario, it would be advantageous to give high credibility weight to the social media accounts of local television and news stations and the reporters who work at those stations. That way, as soon as those stations are able to identify that the claims of closed polling places are false and disseminate that fact on social media, the AI system can calibrate truth and falsity and move to the next step of addressing the posts known to contain disinformation.

It is also important to recognize the limits of what AI can be expected to accomplish. Earlier this year, Samuel Woolley of UT Austin published an excerpt of his book “The Reality Game” in MIT Technology Review. In it, Woolley noted that “There simply is no easy fix to the problem of computational propaganda on social media.” It would be unreasonable to expect any AI solution that will be available in the near future would be able to quickly and unambiguously identify a rapid disinformation attack. However, AI will certainly be able to provide insight into the dynamics of emerging disinformation attacks, pinpoint at least some of the social media accounts at the source, and compute confidence levels regarding the likely truth or falsity of a claim making the rounds on social media. After that, the response will need to be overseen by humans making decisions based on a combination of the AI outputs and guidance from policy frameworks.

Policy Considerations

Public policy will play a central role in both the human and technological aspects of the response to rapid disinformation attacks. At the technology level, policies will need to be embedded into the algorithms in relation to questions such as: What confidence level that a rapid disinformation attack is occurring should trigger notification to human managers that an activity of concern has been identified? Over what time scales should the AI system make that evaluation, and should that time scale depend on the nature and/or extent of the disinformation? For example, suspected disinformation regarding violence should clearly receive a higher priority for immediate resolution than disinformation associated with conflicting online characterizations of what a politician said at a recent campaign speech. Other questions that can drive policies to be embedded in AI disinformation detection systems include: Under what circumstances should an AI system preemptively shut down accounts suspected of originating a rapid disinformation attack? What types of autonomous actions, if any, should be taken to address posts from legitimate accounts that unwittingly contribute to propagating disinformation?

Policy considerations will be an important driver for the human response as well. When an AI system identifies a potential rapid disinformation attack, managers at social media companies will need a set of guidelines for how to proceed. Policies can also guide the extent to which people at social media companies should arrange in advance to be “on call” to watch for rapid disinformation attacks. It is clear that for a short duration, high stakes events like a national election, social media companies will need to have people standing by ready to step in and address disinformation. For events in that category, the question is not whether disinformation will be present, but rather how much of it there will be, and how sophisticated the attacks will be.

“[T]he question is not whether disinformation will be present, but rather how much of it there will be, and how sophisticated the attacks will be.”

For most topics and events, there simply won’t be the resources to supply staffing dedicated to individually monitoring each of the essentially limitless list of situations in which disinformation might arise. This is especially true given that companies such as Facebook and Twitter operate globally; there are literally billions of accounts in nearly two hundred countries that could potentially be employed to disseminate disinformation. As a result, for the vast majority of instances of disinformation, human intervention at the social media companies will of necessity occur only after a problem is flagged either algorithmically or through manual reporting channels.

There will also need to be policies for handling situations in which an AI system makes exactly the wrong decision. Because of the limited data available in the early stages of a rapid disinformation attack, the need to quickly make a determination might lead an algorithm to invert truth and falsity and conclude that the disinformation is accurate and that the attempts to debunk it are themselves a disinformation attack. This is a less far-fetched outcome than it might initially appear to be. Algorithms, like the people who design them, can be influenced by a confirmation bias effect, leading to a boost in confidence in a wrong conclusion by selectively giving greater weight to inputs bolstering that conclusion. Particularly given the short time scales of rapid disinformation attacks, this could lead an algorithm to quickly converge on an incorrect conclusion that would need human intervention to identify and invert.

In short, the combination of a growing social media ecosystem and the availability of increasingly powerful AI tools for content dissemination means that rapid disinformation attacks will be a recurring feature of the online landscape. Addressing these attacks will require further advances in AI, particularly in relation to methods that can quickly assess the reliability of online sources despite the presence of very limited data. It will also require attention within social media companies to ensure that the policies and resources are in place to leverage the capabilities of disinformation detection technology, to complement that with human intervention, and to maximize the likelihood that their platforms will be used to promote, rather than undermine, access to factually accurate information.

The Brookings Institution is a nonprofit organization devoted to independent research and policy solutions. Its mission is to conduct high-quality, independent research and, based on that research, to provide innovative, practical recommendations for policymakers and the public. The conclusions and recommendations of any Brookings publication are solely those of its author(s), and do not reflect the views of the Institution, its management, or its other scholars.

Microsoft provides support to The Brookings Institution’s Artificial Intelligence and Emerging Technology (AIET) Initiative, and Facebook provides general, unrestricted support to the Institution. The findings, interpretations, and conclusions in this report are not influenced by any donation. Brookings recognizes that the value it provides is in its absolute commitment to quality, independence, and impact. Activities supported by its donors reflect this commitment.