Explainability won’t save AI

A researcher looks at a small humanoid robot standing among red and green plastic objects.

Much of artificial intelligence, and particularly deep learning, is plagued by the “black box problem.” While we may know the inputs and outputs of a model, in many cases we do not know what happens in between. AI developers make choices about how to design the model and the learning environment, but they typically do not determine the value of specific parameters and how an answer is reached. The lack of understanding about how an AI system works, in some cases even by the people who have developed it, is one of the reasons AI poses novel safety, ethical, and legal considerations, and why oversight and governance are especially important. Black box deep learning models are vulnerable to adversarial attacks and prone to racial, gender, and other demographic biases. Opacity is especially problematic in high-stakes settings such as health care, lending, and criminal justice, where significant harms have already been reported.

Explainable AI (XAI) is often offered as the answer to the black box problem and is broadly defined as “machine learning techniques that make it possible for human users to understand, appropriately trust, and effectively manage AI.” Around the world, explainability has been referenced as a guiding principle for AI development, including in Europe’s General Data Protection Regulation. Explainable AI has also been a major research focus of the Defence Advanced Research Projects Agency (DARPA) since 2016. However, after years of research and application, the XAI field has generally struggled to realize the goals of understandable, trustworthy, and controllable AI in practice.

This gap stems largely from divergent conceptions of what explainability is expected to achieve and unequal prioritization of various stakeholder objectives. Studies of XAI in practice reveal that engineering priorities are generally placed ahead of other considerations, with explainability largely failing to meet the needs of users, external stakeholders, and impacted communities. By improving clarity about the diversity of XAI objectives, AI organizations and standards bodies can make explicit choices about what they are optimizing and why. AI developers can be held accountable for providing meaningful explanations and mitigating risks—to the organization, to users, and to society at large.

The explainability ideal

The end goal of explainability depends on the stakeholder and the domain. Explainability enables interactions between people and AI systems by providing information about how decisions and events come about, but developers, domain experts, users, and regulators all have different needs from the explanations of AI models. These differences are not only related to degrees of technical expertise and understanding, but also include domain-specific norms and decision-making mechanisms. Achieving explainability goals in one domain will often not satisfy the goals of another.

Consider, for example, the different needs of developers and users in making an AI system explainable. A developer might use Google’s What-If Tool to review complex dashboards that provide visualizations of a model’s performance in different hypothetical situations, analyze the importance of different data features, and test different conceptions of fairness. Users, on the other hand, may prefer something more targeted. In a credit scoring system, it might be as simple as informing a user which factors, such as a late payment, led to a deduction of points. Different users and scenarios will call for different outputs.

For now, users and other external stakeholders are typically afforded little if any insight into the behind-the-scenes workings of the AI systems that impact their lives and opportunities. This asymmetry of knowledge about how an AI system works, and the power to do anything about it, is one of the key dilemmas at the heart of explainability. Accessible and meaningful explanations can help reduce this asymmetry, but explanations are often incomplete and can be used (intentionally or not) to increase the power differentials between those creating AI systems and those impacted by them.

Domain differences

To understand the ways practitioners in different domains have different expectations for what they hope to achieve by building explainable AI systems, it is helpful to explicitly compare their goals. Below, I consider how three different domains—engineering, deployment, and governance—articulate the goals of explainable AI.

Engineering. In 2018, the Institute of Electrical and Electronics Engineers (IEEE) published a survey on explainable AI that illustrates how the technical and engineering domain conceptualizes the goals of XAI:

  1. To justify an AI systems’ results, for example to ensure that an outcome was not made erroneously.
  2. To provide better control over the system, for example by providing visibility into vulnerabilities and flaws.
  3. To continuously improve the system, for example by identifying and fixing gaps in the training data or environment to make it smarter and improve its utility.
  4. To discover new information and knowledge about the world, for example by identifying and relaying new patterns and strategies.

Deployment. As AI applications are rolled out, the technology will increasingly interact with human beings, and the deployment domain seeks to understand how explainability impacts the human relationship with an AI system, including in military and other high-stakes contexts. An overview of DARPA’s XAI program provides an example of the deployment domain’s goals for XAI:

  1. To explain an AI system’s rationale, describing not just what happened, but why.
  2. To characterize its strengths and weaknesses, by letting a user know under what conditions the system will successfully accomplish its goals.
  3. To convey an understanding of how the system will behave in the future, enabling a user to know when its use may be warranted and reliable.
  4. To promote human-machine interaction and enable partnership and coordination.

Governance. A policy briefing on explainable AI by the Royal Society provides an example of the goals the policy and governance domain imagines XAI will achieve:

  1. To give users confidence that an AI system is an effective tool for the purpose.
  2. To safeguard against algorithmic bias, for example through the identification of biased correlations due to skewed datasets or model design choices. 
  3. To adhere to regulatory standards or policy requirements.
  4. To meet society’s expectations about how individuals are afforded agency in a decision-making process.

These differences are highlighted in simplified form in the table below.

Engineering Deployment Governance
Ensure efficacy
Improve control
Improve performance
Discover information
Explain its rationale
Characterize strengths and weaknesses
Inform future expectations
Promote human-machine cooperation
Promote trust
Protect against bias
Follow regulations and policies
Enable human agency

All three domains agree about the importance of explainability providing assurance about the effectiveness and appropriateness of a system at achieving its intended task, but the domains also differ in key ways. The engineering domain highlights the importance of control, which is either assumed in the other domains or not prioritized. And while the governance domain stresses the value of human agency, this is not a necessary outcome of goals in other domains. The engineering domain treats AI systems as constantly in flux and capable of regular improvement, while the other domains apparently expect greater consistency to enable informed expectations and adherance with policies. All three domains imagine different feedback loops. In the engineering domain, it is engineers’ input that is incorporated; in the deployment domain, it is users’ input that is incorporated; only in the governance domain is the impact on broader communities and the technology’s relation to the broader world taken into consideration.

Explainability in practice

The reality of organizations’ use of explainability methods diverge sharply from the aspirations outlined above, according to a 2020 study of explainable AI deployments. In this study of 20 organizations using explainable AI, the majority of deployments were used internally to support engineering efforts, rather than reinforcing transparency or trust with users or other external stakeholders. The study included interviews with roughly 30 people from both for-profit and non-profit groups employing elements of XAI in their operations. Study participants were asked about the types of explanations they have used, how they decided when and where to use them, and the audience and context of their explanations.

The results revealed that local explainability techniques that aim to understand a model’s behavior for one specific input, such as feature importance, were the most commonly used. The primary use of the explanations were to serve as “sanity checks” for the organization’s engineers and research scientists and to identify spurious correlations. Participants looking for a more holistic understanding were interested in deploying global explainability techniques, which aim to understand the high-level concepts and reasoning used by a model, but these were described as much harder to implement. Study participants said it was difficult to provide explanations to end-users because of privacy risks and the challenges of providing real-time information of sufficiently high quality. But most importantly, organizations struggled to implement explainability because they lacked clarity about its objectives.

This study highlights the current primacy of engineering goals for explainability and how the needs of users and other stakehodlers are more difficult to meet. It shows that engineers often use explainability techniques to identify where their models are going wrong and that they may not have sufficient incentives to share this information, which is percieved as sensitive and complex, more broadly. While users and regulators want to see the vulnerabilities of AI systems, they may also want to see plans to fix uncovered problems or mitigate any negative impacts. The findings of this study are consistent with other examples of XAI in practice. For example, one machine learning engineer’s account of explainability case studies documents her experiences of how they were used (successfully) for internal debugging and sanity checks, but not for user engagement.

Another 2020 study documents insights derived from interviews with 20 UX and design practitioners at IBM working on explainability for AI models and futher explains the challenges practitioners face in meeting users’ needs. The study identifies a range of motivations for explainability that emerged from the participants’ focus on user needs, including to gain further insights or evidence about the AI system, to appropriately evaluate its capability, to adapt usage or interaction behaviors to better utilize the system, to improve performance, and to satisfy ethical responsibilities. The study participants said that realizing these motivations was difficult due to the inadequacy of current XAI techniques, which largely failed to live up to user expectations. Participants also described the challenge of needing to balance multiple organizational goals that can be at odds with explainability, including protecting proprietary data and providing users with seamless integration.

These studies highlight that while there are numerous different explainability methods currently in operation, they primarily map onto a small subset of the objectives outlined above. Two of the engineering objectives—ensuring efficacy and improving performance—appear to be the best represented. Other objectives, including supporting user understanding and insight about broader societal impacts, are currently neglected.

Bridging the gaps

The five recommendations below are intended primarily for organizations developing XAI standards and practices. They offer an initial roadmap, highlighting relevant research and priorities that can help address the limitations and risks of explainability.

  1. Diversify XAI objectives. Explainability techniques are currently developed and incorporated by machine learning engineers, and not surprisingly, their needs (and companies’ desire to avoid legal trouble) are being prioritized.Realizing a broader set of XAI objectives will require both greater awareness of their existence and a shift in incentives for accomplishing them. XAI standards and policy guidelines should explicitly include the needs of users, stakeholders, and impacted communities to incentivize this shift. Explainability case studies are one pedagogical tool that can help practitioners and educators understand and develop more holistic explainability strategies. Diverse organizational guidance documents, recommendations, and high-level frameworks can also help guide an organizations’ executives and/or developers through key questions to support explainability that is useful and relevant to different stakeholders. 
  2. Establish XAI metrics. While there has been some work done to evaluate AI explanations, most attempts are either computationally expensive or only focus on  a small subset of what constitutes a “good explanation” and fail to capture other dimensions. Measuring effectiveness more holistically likely requires combining a comprehensive overview of XAI approaches, a review of the different forms of opacity, and the development of standardized metrics. In particular, the evaluation of explanations will need to take into account the specific contexts, needs, and norms in a given case, and use both quantitative and qualitative measures. Further work in this space will help hold organizations accountable and promote successful AI deployment.
  3. Mitigate risks. Explainability entails risks. Explanations may be misleading, deceptive, or be exploited by nefarious actors. Explanations can pose privacy risks, as they can be used to infer information about the model or training data. Explainability may also make it easier for proprietary models to be replicated, opening up research to competitors. Methods for both documenting and mitigating these risks are needed and emerging standards and policy guidelines should include practical measures to do so. For some high-stakes decisions, it may be better to forgo deep learning models and the need for explainability techniques.
  4. Prioritize user needs. So far,explainability has primarily served the interests of AI developers and companies by helping to debug and improve AI systems, rather than opening them to oversight or making the systems understandable to users. Prioritizing user needs has received some research attention, but user needs remain neglected. Key considerations in providing explanations to users include understanding the context of an explanation, communicating uncertainty associated with model predictions, and enabling user interaction with the explanation. Other user concerns include design practices for user experiences and accessibility. The field might incorporate decades of experience from the theory of risk communication. For example, this roadmap for risk communication with users developed by the Center for Long-Term Cybersecurity provides insights into the needs for two-way communication, accessible choice architecture, and protection for whistleblowers, among other mechanisms that help promote user interests.
  5. Explainability isn’t enough. Although explainability may be necessary to achieve trust in AI models, it is unlikely to be sufficient. Simply having a better understanding of how a biased AI model arrived at a result will do little to achieve trust. When students in England recently learned that they had been assigned standardized test scores based on a simple algorithm that had ascribed weight to schools’ historic performance and, thus, advantaged rich schools, they were outraged, sparking protests in cities around the country. Explainability will only result in trust alongside testing, evaluation, and accountability measures that go the extra step to not only uncover, but also mitigate exposed problems. And while explainability techniques will highlight elements of how a model works, users should not be expected to determine if that process is sufficient or to force changes when it is not. The precedent set by the 2017 Loomis v. Wisconsin case, in which the lack of explainability and potential racial bias in a criminal risk assessment algorithm were not seen as violating due process, underscores the gaps in accountability. Independent auditing and updated liability regimes, among other accountability measures, will also be needed to promote lasting trust.

Explainability is seen as a central pillar of trustworthy AI because, in an ideal world, it provides understanding about how a model behaves and where its use is appropriate. The prevalence of bias and vulnerabilities in AI models means that trust is unwarranted without sufficient understanding of how a system works. Currently, there is a significant discrepancy between the vision of explainability as a principle that reaches across domains and works for diverse stakeholders, and how it is being incorporated in practice. Bridging that gap requires greater transparency about the goals being optimized, and further work to ensure those goals align with the needs of users and the benefit of society at large.

Without clear articulation of the objectives of explainability from different communities, AI is more likely to serve the interests of the powerful. AI companies should clarify how they are using XAI techniques, to what end, and why, and make full explanations as transparent as possible. The entities currently developing XAI standards and regulations, including the National Institute of Standards and Technology, should take note of current limitations of XAI in practice and seek out diverse expertise about how to better align incentives and governance with a full picture of XAI objectives. It is only with the active involvement of many stakeholders, from the social sciences, computer science, civil society, and industry, that we may realize the goals of understandable, trustworthy, and controllable AI in practice.

Jessica Newman is a research fellow at the UC Berkeley Center for Long-Term Cybersecurity

For more about the practices and challenges of implementing AI principles, see the CLTC report, Decision Points in AI Governance: Three Case Studies Explore Efforts to Operationalize AI Principles.

IBM provides financial support to the Brookings Institution, a nonprofit organization devoted to rigorous, independent, in-depth public policy research.