Sections

Research

Detecting and mitigating bias in natural language processing

John Bumgarner, a cyber warfare expert who is chief technology officer of the U.S. Cyber Consequences Unit, a non-profit group that studies the impact of cyber threats, works on his laptop computer during a portrait session in Charlotte, North Carolina December 1, 2011. A cyber warfare expert claims he has linked the Stuxnet computer virus that attacked Iran's nuclear program in 2010 to Conficker, a mysterious worm that surfaced in late 2008 and infected millions of PCs. Conficker was used to open back doors into computers in Iran, then infect them with Stuxnet, according to research Bumgarner, a retired U.S. Army special-operations veteran and former intelligence officer.  To match Insight - CYBERSECURITY/IRAN     REUTERS/John Adkisson    (UNITED STATES - Tags: SCIENCE TECHNOLOGY MILITARY)
Editor's note:

This report from The Brookings Institution’s Artificial Intelligence and Emerging Technology (AIET) Initiative is part of “AI and Bias,” a series that explores ways to mitigate possible biases and create a pathway toward greater fairness in AI and emerging technologies.

Executive Summary

Unsupervised artificial intelligence (AI) models that automatically discover hidden patterns in natural language datasets capture linguistic regularities that reflect human biases, such as racism, sexism, and ableism. These unsupervised AI models, namely word embeddings, provide the foundational, general-purpose, numeric representation of language for machines to process textual data.

Word embeddings identify the hidden patterns in word co-occurrence statistics of language corpora, which include grammatical and semantic information as well as human-like biases. Consequently, when word embeddings are used in natural language processing (NLP), they propagate bias to supervised downstream applications contributing to biased decisions that reflect the data’s statistical patterns. These downstream applications perform tasks such as information retrieval, text generation, machine translation, text summarization, and web search, in addition to consequential decision-making during resume screening for job candidate selection, university admissions automation, or essay grading. Word embeddings play a significant role in shaping the information sphere and can aid in making consequential inferences about individuals. Job interviews, university admissions, essay scores, content moderation, and many more decision-making processes that we might not be aware of increasingly depend on these NLP models.

Billions of people using the internet every day are exposed to biased word embeddings. However, no regulation is in place to audit these AI technologies that pose potential threats to equity, justice, and democracy. As a result, there is an urgent need for regulatory mechanisms, a diverse AI ethics workforce, and technical approaches to prevent AI technologies from accelerating its harmful side-effects.

Understanding Bias in Natural Language Processing (NLP)

Amazon’s automated resume screening for selecting the top job candidates turned out to be discriminating against women in 2015. Amazon used resume samples of job candidates from a 10-year period to train its recruitment models. This supervised downstream NLP application learned how to score candidates by computing the patterns in previous resume samples from Amazon and respective information regarding the success level of the job candidate. As a result, the trained model learned the historical trends associated with employment at Amazon by discovering linguistic patterns on resumes. Women were underrepresented in the training set collected from employees. Consequently, the resume screening model associated men and the linguistic signals on their resumes with successful employment at Amazon, whereas resumes of candidates which contained words associated with women were frequently discarded by the algorithm. The biased patterns learned by the model led to discrimination against female job candidates. Amazon soon abandoned the automated recruitment tool after they discovered the bias.

NLP applications’ biased decisions not only perpetuate historical biases and injustices, but potentially amplify existing biases at an unprecedented scale and speed. Future generations of word embeddings are trained on textual data collected from online media sources that include the biased outcomes of NLP applications, information influence operations, and political advertisements from across the web. Consequently, training AI models on both naturally and artificially biased language data creates an AI bias cycle that affects critical decisions made about humans, societies, and governments.

AI and NLP technologies are not standardized or regulated, despite being used in critical real-world applications. Technology companies that develop cutting edge AI have become disproportionately powerful with the data they collect from billions of internet users. These datasets are being used to develop AI algorithms and train models that shape the future of both technology and society. AI companies deploy these systems to incorporate into their own platforms, in addition to developing systems that they also sell to governments or offer as commercial services.

“Technology companies that develop cutting edge AI have become disproportionately powerful with the data they collect from billions of internet users.”

With the lack of regulation and readily available bias auditing mechanisms, AI companies have not provided transparency in the everyday effects of the algorithms that they deploy in society. For example, Google’s machine translation algorithms convert the gender-neutral Turkish sentences “O bir profesör. O bir öğretmen” to the English sentences “He’s a professor. She is a teacher.” Facebook ran human subject experiments on its platform to study how to manipulate users’ emotions via biased text that induces associations of unpleasantness.

Social media platforms automatically decide which users should be exposed to certain types of content present in political advertisements and information influence operations, based on personality characteristics predicted from their data. As researchers identify and measure the harmful side effects of NLP algorithms that incorporate biased models of language, regulation of algorithms and AI models can help alleviate the harmful downstream impacts of large-scale AI technologies.

Biases in word embeddings

In 2017, at Princeton University’s Center for Information Technology Policy, Joanna Bryson, Arvind Narayanan, and I developed methods demonstrating that word embeddings learn human-like biases from word co-occurrence statistics. When words representing concepts appear frequently with certain attributes, word embeddings learn to associate the concept with the co-occurring attributes. For example, sentences that contain words related to kitchen or arts tend to contain words related to women. However, sentences that contain career, science, and technology terms tend to contain words related to men. As a result, when machines are processing language to learn word embeddings, women, as a social group, appear in close proximity to words like family and arts relative to men; whereas, men, as a social group, appear in close proximity to career, science, and technology. We found that stereotypical associations exist for gender, race, age, and intersections among these characteristics. When these stereotypical associations propagate to downstream applications that present information on the internet or make consequential decisions about individuals, they disadvantage minority and underrepresented group members. As long as language corpora used to train NLP models contain biases, word embeddings will keep replicating historical injustices in downstream applications unless effective regulatory practices are implemented to deal with bias.

Racial bias in NLP

Studying biases in widely used word embeddings trained on a corpus of 800 billion words collected from the web reveals that names of African Americans tend to co-occur with unpleasant words. Measuring the relative association of names of African Americans vs. names of white people with pleasant and unpleasant words shows that the word embeddings contain negative associations for the concept of an African American social group due to the biased depiction of the group on the internet. These types of associations that reflect negative attitudes toward one social group are considered harmful and prejudiced. Similar negative associations are reflected for the elderly and people with disabilities. And women are often associated with family and literature, whereas men are associated with career and science. It is also worth noting that state-of-the-art language models generally capture the stereotypes and biases present in American culture, even though these NLP technologies are employed across the world.

In 2004, a controlled study on labor market discrimination found that resumes that contain uniquely white names receive 50 percent more callbacks for interviews compared to resumes with uniquely African American names with the same qualifications. Using the job applicant names provided in the labor market discrimination study during bias quantification in word embeddings exposes strong negative associations with African Americans as a social group. While humans make consequential decisions about other humans on individual or collective bases, black-box NLP technologies make large-scale decisions that are deterministically biased. Accordingly, society faces a more significant and accelerated challenge compared to dealing with human decisionmakers as NLP is not regulated to promote equity and social justice.

Gender bias in NLP

State-of-the-art large language models that learn dynamic context-dependent word embeddings, such as the multi-million-dollar model GPT-3, associates men with competency and occupations demonstrating higher levels of education in downstream NLP tasks. Many experts consider the text generated by GPT-3 as indistinguishable from human-generated text based on various criteria. Regardless, when prompted for language generation with the input “what is the gender of a doctor?” the first answer is, “Doctor is a masculine noun;” whereas, when prompted with “What is the gender of a nurse?” the first answer is, “It’s female.”

Moreover, word embeddings, either static or dynamic, associate the intersection of race and gender with the highest magnitude of disadvantaging bias. Like other AI algorithms that reflect the status quo, all social groups that are not made up of white men are represented as minority groups due to a lack of accurate and unbiased data to train word embeddings. For example, members of multiple minority groups, such as African American women, are strongly associated with various disadvantaging biases compared to the relatively less intense biases associated with their constituent minority groups, African Americans or women. The same strong and potentially harmful biased associations exist for Mexican American women as well. Consequently, propagation of social group bias in downstream NLP applications such as automated resume screening would not only perpetuate existing biases but potentially exacerbate harmful biases in society that will affect future generations.

The problems of debiasing by social group associations

Word embedding debiasing is not a feasible solution to the bias problems caused in downstream applications since debiasing word embeddings removes essential context about the world. Word embeddings capture signals about language, culture, the world, and statistical facts. For example, gender debiasing of word embeddings would negatively affect how accurately occupational gender statistics are reflected in these models, which is necessary information for NLP operations. Gender bias is entangled with grammatical gender information in word embeddings of languages with grammatical gender. Word embeddings are likely to contain more properties that we still haven’t discovered. Moreover, debiasing to remove all known social group associations would lead to word embeddings that cannot accurately represent the world, perceive language, or perform downstream applications. Instead of blindly debiasing word embeddings, raising awareness of AI’s threats to society to achieve fairness during decision-making in downstream applications would be a more informed strategy.

Meanwhile, a diverse set of expert humans-in-the-loop can collaborate with AI systems to expose and handle AI biases according to standards and ethical principles. There are also no established standards for evaluating the quality of datasets used in training AI models applied in a societal context. Training a new type of diverse workforce that specializes in AI and ethics to effectively prevent the harmful side effects of AI technologies would lessen the harmful side-effects of AI.

What can policymakers do to create fairness in NLP?

Unless society, humans, and technology become perfectly unbiased, word embeddings and NLP will be biased. Accordingly, we need to implement mechanisms to mitigate the short- and long-term harmful effects of biases on society and the technology itself. We have reached a stage in AI technologies where human cognition and machines are co-evolving with the vast amount of information and language being processed and presented to humans by NLP algorithms. Understanding the co-evolution of NLP technologies with society through the lens of human-computer interaction can help evaluate the causal factors behind how human and machine decision-making processes work. Identifying the causal factors of bias and unfairness would be the first step in avoiding disparate impacts and mitigating biases.

To analyze these natural and artificial decision-making processes, proprietary biased AI algorithms and their training datasets that are not available to the public need to be transparently standardized, audited, and regulated. Technology companies, governments, and other powerful entities cannot be expected to self-regulate in this computational context since evaluation criteria, such as fairness, can be represented in numerous ways. Satisfying fairness criteria in one context can discriminate against certain social groups in another context. Moreover, with new AI techniques, desired fairness criteria can be artificially satisfied, while discriminating against minority populations, by applying AI tricks via adversarial machine learning. Meanwhile, it might take centuries to develop sophisticated AI technologies aligned with human values that can self-regulate.

“Diversifying the pool of AI talent can contribute to value sensitive design and curating higher quality training sets representative of social groups and their needs.”

Biased NLP algorithms cause instant negative effect on society by discriminating against certain social groups and shaping the biased associations of individuals through the media they are exposed to. Moreover, in the long-term, these biases magnify the disparity among social groups in numerous aspects of our social fabric including the workforce, education, economy, health, law, and politics. Diversifying the pool of AI talent can contribute to value sensitive design and curating higher quality training sets representative of social groups and their needs. Humans in the loop can test and audit each component in the AI lifecycle to prevent bias from propagating to decisions about individuals and society, including data-driven policy making. Achieving trustworthy AI would require companies and agencies to meet standards, and pass the evaluations of third-party quality and fairness checks before employing AI in decision-making.

Technology companies also have the power and data to shape public opinion and the future of social groups with the biased NLP algorithms that they introduce without guaranteeing AI safety. Technology companies have been training cutting edge NLP models to become more powerful through the collection of language corpora from their users. However, they do not compensate users during centralized collection and storage of all data sources. This strategy, coupled with financial incentives that require the personal information of users, have led to surveillance capitalism and automated discrimination through optimization at a speed that was not possible with previously smaller industrial-scale tools available in society. Due to a lack of regulation, these ongoing unethical AI practices have been rapidly undermining equity and democracy.

The state-of-the-art, large commercial language model licensed to Microsoft, OpenAI’s GPT-3 is trained on massive language corpora collected from across the web. The computational resources for training OpenAI’s GPT-3 cost approximately 12 million dollars. Researchers can request access to query large language models, but they do not get access to the word embeddings or training sets of these models. Consequently, to systematically study these high-impact applications, researchers need enormous resources to replicate the models to measure the magnitude of biases and gain insights on how they might be shaping society, public discourse, our values, and opinions.

Without access to the training data and dynamic word embeddings, studying the harmful side-effects of these models is not possible. And having access to word embeddings and data can facilitate new scientific discoveries for social good, including advances such as the discovery of new materials from word embeddings. However, developers of large language models are unable to share the training corpora due to data privacy laws. Moreover, adversarial machine learning researchers recently showed that it is possible to extract training data, including personally identifiable information, from large language models. Researchers, developers, and policymakers desperately need an environment to work on these models together, however, the lack of established standards hinders scientific progress and is highly likely  to damage society. Passing federal privacy legislation to hold technology companies responsible for mass surveillance is a starting point to address some of these problems. Defining and declaring data collection strategies, usage, dissemination, and the value of personal data to the public would raise awareness while contributing to safer AI.

Bringing together a diverse AI and ethics workforce plays a critical role in the development of AI technologies that are not harmful to society. Among many other benefits, a diverse workforce representing as many social groups as possible may anticipate, detect, and handle the biases of AI technologies before they are deployed on society. Further, a diverse set of experts can offer ways to improve the under-representation of minority groups in datasets and contribute to value sensitive design of AI technologies through their lived experiences.

Other recommendations to debias NLP include:

  • Implementing audit mechanisms to track the magnitude and types of biases in data produced by NLP algorithms, such as information retrieved by social media platforms, would be one step towards understanding how AI bias might be shaping public opinion. Accordingly, an audit could reveal the emergence of new harmful biases, including hate speech or harmful marginalization of social groups.
  • Establishing standards regarding AI model training data to understand which populations the dataset represents and if it has been contaminated by information influence operations, synthetic data generated by large language models, or disproportionate political advertisement.
  • Learning from data security evaluation tasks to reveal if NLP datasets are trained on authentic natural language data that has not been manipulated during information influence operations spreading on Facebook, Reddit, Twitter, and other online platforms.
  • Using data quality recommendations to improve the representation of social groups in the corpus and analyzing a priori how the algorithms will behave.
  • Establishing standards about sharing word embeddings, multi-million-dollar language models, and their training data with researchers can accelerate scientific progress and benefits to society.
  • Regulating NLP when algorithms make consequential decisions could satisfy appropriate fairness criteria with respect to protected group attributes.

Conclusion

The complex AI bias lifecycle has emerged in the last decade with the explosion of social data, computational power, and AI algorithms. Human biases are reflected to sociotechnical systems and accurately learned by NLP models via the biased language humans use. These statistical systems learn historical patterns that contain biases and injustices, and replicate them in their applications. NLP models that are products of our linguistic data as well as all kinds of information that circulates on the internet make critical decisions about our lives and consequently shape both our futures and society. These NLP models are behind every technology using text such as resume screening, university admissions, essay grading, voice assistants, the internet, social media recommendations, dating applications, news article summarizations, machine translation, and text generation. If these new developments in AI and NLP are not standardized, audited, and regulated in a decentralized fashion, we cannot uncover or eliminate the harmful side effects of AI bias as well as its long-term influence on our values and opinions. Undoing the large-scale and long-term damage of AI on society would require enormous efforts compared to acting now to design the appropriate AI regulation policy.


The Brookings Institution is a nonprofit organization devoted to independent research and policy solutions. Its mission is to conduct high-quality, independent research and, based on that research, to provide innovative, practical recommendations for policymakers and the public. The conclusions and recommendations of any Brookings publication are solely those of its author(s), and do not reflect the views of the Institution, its management, or its other scholars.

Microsoft provides support to The Brookings Institution’s Artificial Intelligence and Emerging Technology (AIET) Initiative, and Amazon, Google, and Facebook provide general, unrestricted  support to the Institution. The findings, interpretations, and conclusions in this report are not influenced by any donation. Brookings recognizes that the value it provides is in its absolute commitment to quality, independence, and impact. Activities supported by its donors reflect this commitment.

  • Footnotes
    1. Caliskan, A., Bryson, J. J., & Narayanan, A. (2017). Semantics derived automatically from language corpora contain human-like biases. Science356(6334), 183-186.
    2. Lee, N. T. (2018). Detecting racial bias in algorithms and machine learning. Journal of Information, Communication and Ethics in Society, 16(3), 252-260.
    3. Dastin, J. (2018) “Amazon Scraps Secret AI Recruiting Tool that Showed Bias Against Women.” Reuters.  https://www.reuters.com/article/us-amazon-com-jobs-automation-insight/amazon-scraps-secret-ai-recruiting-tool-that-showed-bias-against-women-idUSKCN1MK08G.
    4. Kramer, A. D., Guillory, J. E., & Hancock, J. T. (2014). Experimental evidence of massive-scale emotional contagion through social networks. Proceedings of the National Academy of Sciences, 111(24), 8788-8790.
    5. Thorson, K., Cotter, K., Medeiros, M., & Pak, C. (2021). Algorithmic inference, political interest, and exposure to news and politics on Facebook. Information, Communication & Society, 24(2), 183-200.
    6. Caliskan, A., Bryson, J. J., & Narayanan, A. (2017). Semantics derived automatically from language corpora contain human-like biases. Science356(6334), 183-186.
    7. Caliskan, A., Bryson, J. J., & Narayanan, A. (2017). Semantics derived automatically from language corpora contain human-like biases. Science356(6334), 183-186.
    8. Bertrand, M., & Mullainathan, S. (2004). Are Emily and Greg more employable than Lakisha and Jamal? A field experiment on labor market discrimination. American Economic Review, 94(4), 991-1013.
    9. Barocas, S., & Selbst, A. D. (2016). Big Data’s Disparate Impact. California Law Review, 104(671), 671-732.
    10. Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., … & Agarwal, S. (2020). Language models are few-shot learners. arXiv preprint arXiv:2005.14165.
    11. Guo, W., & Caliskan, A. (2020). Detecting emergent intersectional biases: Contextualized word embeddings contain a distribution of human-like biases. AAAI/ACM Conference on Artificial Intelligence, Ethics, and Society 2021.
    12. Raghavan, M., Barocas, S., Kleinberg, J., & Levy, K. (2020). Mitigating bias in algorithmic hiring: Evaluating claims and practices. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, 469-481.
    13. Toney, A., & Caliskan, A. (2020). ValNorm: A New Word Embedding Intrinsic Evaluation Method Reveals Valence Biases are Consistent Across Languages and Over Decades. arXiv preprint arXiv:2006.03950.
    14. Chang, H., Nguyen, T. D., Murakonda, S. K., Kazemi, E., & Shokri, R. (2020). On Adversarial Bias and the Robustness of Fair Machine Learning. arXiv preprint arXiv:2006.08669.
    15. Ali, M., Sapiezynski, P., Bogen, M., Korolova, A., Mislove, A., & Rieke, A. (2019). Discrimination through optimization: How Facebook’s ad delivery can lead to skewed outcomes. arXiv preprint arXiv:1904.02095; Floridi, L., & Chiriatti, M. (2020). GPT-3: Its nature, scope, limits, and consequences. Minds and Machines, 30(4), 681-694.
    16. Floridi, L., & Chiriatti, M. (2020). GPT-3: Its nature, scope, limits, and consequences. Minds and Machines, 30(4), 681-694.
    17. Tshitoyan, V., Dagdelen, J., Weston, L., Dunn, A., Rong, Z., Kononova, O., … & Jain, A. (2019). Unsupervised word embeddings capture latent knowledge from materials science literature. Nature, 571(7763), 95-98.
    18. Carlini, N., Tramer, F., Wallace, E., Jagielski, M., Herbert-Voss, A., Lee, K., … & Oprea, A. (2020). Extracting Training Data from Large Language Models. arXiv preprint arXiv:2012.07805.