Sections

Research

Fairness in algorithmic decision-making

An image of Israeli soldiers is seen on a computer screen with colourful markings of a face recognition programming script, during a cyber security training course, called a Hackathon, at iNT Institute of Technology and Innovation, at a high-tech park in Beersheba, southern Israel August 28, 2017. Picture taken August 28, 2017. REUTERS/Amir Cohen - RC1DE502FAB0
Editor's note:

This report from The Brookings Institution’s Artificial Intelligence and Emerging Technology (AIET) Initiative is part of “AI and Bias,” a series that explores ways to mitigate possible biases and create a pathway toward greater fairness in AI and emerging technologies.

Introduction

Algorithmic or automated decision systems use data and statistical analyses to classify people for the purpose of assessing their eligibility for a benefit or penalty. Such systems have been traditionally used for credit decisions, and currently are widely used for employment screening, insurance eligibility, and marketing. They are also used in the public sector, including for the delivery of government services, and in criminal justice sentencing and probation decisions.

Most of these automated decision systems rely on traditional statistical techniques like regression analysis. Recently, though, these systems have incorporated machine learning to improve their accuracy and fairness. These advanced statistical techniques seek to find patterns in data without requiring the analyst to specify in advance which factors to use. They will often find new, unexpected connections that might not be obvious to the analyst or follow from a common sense or theoretic understanding of the subject matter. As a result, they can help to discover new factors that improve the accuracy of eligibility predictions and the decisions based on them. In many cases, they can also improve the fairness of these decisions, for instance, by expanding the pool of qualified job applicants to improve the diversity of a company’s workforce.

A significant new challenge with these machine learning systems, however, is ascertaining when and how they could introduce bias into the decision-making process. Several technical features of these systems might produce discriminatory decisions that are artifacts of the models themselves. The input data used to train the systems could underrepresent members of protected classes or be infected by past discriminatory practices. Consequently, the data could inadvertently reproduce or magnify historical patterns of bias.

Further, proxies for protected classes might be hiding undetected within other factors used in machine learning models. Despite the complexity, these factors might unintentionally be too coarse to accurately capture the qualifications of members of protected classes.

“Undetected and unaddressed, these potential biases might prevent machine learning systems from fulfilling their promise of significantly improving the accuracy and fairness of automated decision systems.”

Undetected and unaddressed, these potential biases might prevent machine learning systems from fulfilling their promise of significantly improving the accuracy and fairness of automated decision systems. They might also expose the developers and users of these systems to legal liability for failure to comply with the anti-discrimination laws.

Legal background and developments

A range of U.S. law forbids discrimination against protected classes in a variety of contexts, such as employment, credit, housing, public accommodation, public education, jury selection, use of genetic information, and health care and health insurance. Protected classes include: racial, ethnic, religious, and national minorities; women; seniors; and people with genetic vulnerabilities, disabilities, or pre-existing medical conditions.

Illegal discrimination can be intentional, where a company deliberately takes protected-class status into account to make decisions that disadvantage members of the protected class. Examples of this include a company explicitly excluding members of a racial group from its hiring pool, or a company consciously applying neutral decision criteria in ways that harm protected class members—such as only testing members of a protected class for a job skill. However, discrimination can also be completely unconscious when a neutral procedure produces decisions that disproportionately and systematically harm protected classes.

Anti-discrimination laws cover the use of automated decision systems whether based on traditional statistical techniques or machine learning algorithms. Even if developers deliberately avoid using variables for protected classes, such systems can still produce a disparate impact if they use variables that are correlated with both the output variable the system is trying to predict and a variable for protected-class status. Consequently, the legal risk in using these systems arises less from the possibility of intentional discrimination and more from exposure to claims of disparate impact.

“Even if developers deliberately avoid using variables for protected classes, [automated decision] systems can still produce a disparate impact.”

Under anti-discrimination laws, liability for disparate impact on protected classes is complex and controversial. Court decisions in recent years have been skeptical of these claims, and this jurisprudence has lessened the sense of urgency in the private sector to avoid disparate impact discrimination. The complexities and uncertainties increase when automated decision systems are involved.

When evaluating legal complaints for disparate impact, a three-stage burden-shifting framework typically applies. The plaintiff must show that a decision procedure causes a disproportionate harmful effect on a protected class. Then the burden shifts to the defendant, who is required to show that the decision procedure serves a legitimate business purpose. Then the burden shifts back to the plaintiff who must produce evidence of an available alternative that would achieve the purpose with a less harmful impact on the protected class.

Despite legal complexities, regulatory agencies have applied disparate impact analysis under current anti-discrimination laws to the most up-to-date machine learning techniques. The Consumer Financial Protection Bureau (CPFB) claimed jurisdiction over Upstart, a company that used alternative data and algorithms to assess lending decisions, with respect to compliance with the Equal Credit Opportunity Act. In its 2019 assessment, the CPFB found that the Upstart model approved more loans and lowered interest rates across all races and minority groups, thus creating no disparities that required further action under the fair lending laws.

Additionally, the Department of Housing and Urban Development this year brought a case against Facebook alleging that it violated the prohibition on housing discrimination because its machine learning algorithms selected the housing advertiser’s audience in a way that excluded certain minority groups. The department claimed Facebook’s algorithms functioned “just like an advertiser who intentionally targets or excludes users based on their protected class.”

Despite these initiatives, some reforms are under consideration that might weaken disparate impact liability. The Department of Housing and Urban Development has proposed to revise the burden-shifting framework for showing disparate impact. The agency proposes to provide companies with several new ways to defeat a prima facie claim that their risk models are causing a discriminatory effect. First, a defendant could escape liability if it can show that an in-house proprietary model predicts credit risk and that the major factors used are not proxies for protected classes. It can use a third-party validator for this showing. Second, housing lenders could escape disparate impact liability if they rely on third-party algorithms and use the model as intended, provided the third-party vendor is “a recognized third party that determines industry standards.” In this new framework, the plaintiff might never get a chance to show the availability of an equally effective, alternative decision model that produces less harmful outcomes for protected classes.

Other legal and policy developments are relevant to the use of algorithmic decision systems. Cases before the Supreme Court relate to whether current law bars workplace discrimination on the basis of sexual orientation or gender identity. The law remains unclear concerning discrimination in other areas, like housing, public education, public accommodations, jury selection, and credit. The Equality Act, legislation that would expand protected classes to include sexual orientation and gender identity in a wide range of contexts, passed the House in 2019 and is pending in the Senate. While these Supreme Court cases and The Equality Act do not directly relate to AI-based automated decision systems, they would clarify the groups whose interests must not be harmed under the anti-discrimination laws and so affect the legal responsibilities of developers and users of these systems to avoid discriminatory effects of these systems.

Congress is also considering legislation to mandate company consideration of algorithmic fairness. The Algorithmic Accountability Act of 2019 requires companies to assess their automatic decision systems for risks of “inaccurate, unfair, biased, or discriminatory decisions.” They must also “reasonably address” the results of their assessments. The bill empowers the Federal Trade Commission to conduct rulemaking proceedings to resolve the crucial details of these requirements.

Recommendations for companies

This legal background provides the context for company consideration of disparate impact assessment of the automated decision systems they develop or use. A key initial point is that companies in all parts of the economy need to focus on the fairness of the algorithms they use. Algorithmic bias is not just a tech-sector problem.

The key recommendation in this paper is that all companies need to understand and inform the public when they use automated decision systems that produce disproportionately adverse effects on people in protected classes. It should not be a surprise to a company that its decision-making systems produce an adverse outcome for protected classes. Furthermore, it should not be left to the public to discover these effects through random personal experiences, as allegedly happened in the case of Apple’s venture into the credit card business with Goldman Sachs.

“[A]ll companies need to understand and inform the public when they use automated decision systems that produce disproportionately adverse effects on people in protected classes.”

In the three-step analysis I explained earlier, the third step—which investigates whether there are alternative models or improvements to the existing model that would achieve the legitimate objective with less of a disparate impact—is the most essential. This investigation of alternatives is the vital way to make progress in reducing disparate impact.

In the employment context, for instance, a company might assess whether its automated screening or hiring algorithm complies with the 80% rule of thumb commonly used in this context. For example, if the algorithm selects 10% of white applicants to be hired, this rule of thumb holds that no fewer than 8% of African American applicants can be selected. If the algorithm does not meet this rule of thumb, then the company can investigate whether there are other algorithms that effectively screen job applicants while meeting this standard.

Independent researchers often perform these disparate impact analyses using publicly available information. ProPublica found that the COMPAS risk score used to determine criminal sentencing and assist in parole decisions is twice as likely to commit errors with Black individuals than white. In addition, Upturn demonstrated that Facebook’s machine learning ad targeting practices are disproportionately skewed along gender and racial lines for employment and housing ads.

Independent researchers also found bias in software widely used to allocate health care to hospital patients. For example, one algorithm assigned patients to risk categories based on their average health care costs. But because of racial disparities in access to health care, equally sick Black patients do not receive as much care as white patients, and the algorithm unintentionally exacerbated this racial disparity. Using the algorithm, Black patients accounted for only 17.7% of patients recommended for extra care, but using an unbiased algorithm, 46.5% of patients assigned extra care would have been Black.

As discussed in the next section, government agencies can and should conduct disparate impact assessments. The key recommendation for the private sector, however, is that software developers should perform these disparate impact analyses themselves before using their algorithms or making them available for others to use. They should not rely on independent researchers or government agencies to detect disparate impact after the algorithms are already in widespread use.

“[S]oftware developers should perform disparate impact analyses themselves before using their algorithms or making them available for others to use.”

If the assessment shows that the model does have a disparate impact, the next step is to determine what should be done. In general, companies can engage researchers who are aware of how disparities arise in particular contexts and work with them to address any disparities the model produces before putting the model into use.

The company developing the software should assess the different ways in which disparate impact could be an artifact of the model itself. Models can introduce bias through the selection of target variables, through the use of biased or unrepresentative training data, or through the use of insufficiently detailed and granular factors. Each of these sources of model bias can be examined and technical fixes sought. For instance, oversampling and retroactive data correction could be used to adjust for bias in the training data.

Amazon tried this approach in its attempt to develop a hiring algorithm for software engineers. It experimented with using historical data to train an automated method of selecting promising applicants, did disparate impact analyses, and found that these algorithmic results disproportionately rejected women. While the company fixed the factors that were leading to this disparate impact, it encountered others it could not fix. Ultimately, Amazon abandoned its attempt at recruitment automation.

The developer of the biased health-care software did not do this, however, even though “routine statistical tests” would have revealed its racial disparities. Hospitals used it and similar algorithms to manage healthcare for 200 million people before researchers “stumbled onto the problem.” Now, the health-care software company is working with researchers to develop an algorithm that is both better at predicting who will need health care and less biased, but this work of repairing disparate impacts should have been done at the development stage.

Sometimes generating new data will increase accuracy of the model in a way that reduces its disparate impact. This can happen when members of protected classes have not historically participated in the relevant activities due to past discriminatory practices and have not left data records that can be used for statistical analyses. In these circumstances, it is more expensive to find qualified members of a protected class, and the economic payoff for the company does not always justify these additional costs.

If, however, technical fixes reduce the disparate impact in the model at reasonable cost, the company should adopt the revised model. There might be exposure to legal liability for disparate impact discrimination if a company failed to adopt reasonable, technical improvements in their decision-making systems that would achieve their business or organizational objectives with less of an impact on protected classes. It is also the right thing to do.

However, there isn’t always a technical fix. The disparate impact in the model might reflect inequalities in the real world. Even in that case, however, a number of adjustments to the model are possible to reduce decision-making disparities and move the output of the model toward statistical parity or equal error rates. These adjustments might involve taking direct cognizance of protected class characteristics or factors that are substantial proxies for them.

It is important to recognize that the decision to use or not to use these further adjustments is not itself a technical fix. These model alterations would not be aimed at reducing the bias introduced into decision-making by the statistical model itself. They are meant to substantively alter the outcome of the decision-making system, and as such constitute normative judgments that have business and legal implications. While they will improve outcomes for protected classes, they will inevitably decrease the accuracy of the resulting decisions, which in practice would mean less satisfactory achievement of business or organizational objectives. In addition, they could also create legal risk to the extent that they are viewed themselves as instances of discriminatory treatment on the basis of prohibited characteristics.

The technical and legal issues involved in conducting disparate impact analyses are formidable. For this reason, not all companies have the expertise or resources to conduct these analyses themselves. For the same reason, however, these companies would not be able to develop sophisticated algorithmic decision systems themselves. So, developers of these systems should produce disparate impact analyses to accompany their systems and make them available to their customers in the same way that they produce and distribute validation studies on the accuracy and reliability of their systems.

An additional step for developers and users of these systems would be the disclosure of information to outside parties to allow independent validation and disparate impact assessment. Analysis by independent organizations provides an additional level of credibility. Much can be done with publicly available information, but the key information is often proprietary and of great commercial value. Companies should consider selecting outside researchers to work with in connection with disparate impact assessments. Facebook is experimenting with such a system in its Social Science One initiative in connection with election issues, and it could extend such an open system to the assessment of disparate impacts.

Longer term, the marketplace might develop independent auditing organizations for disparate impact assessments similar to those that assess the financial well-being of public companies.

Finally, companies should consider performing these assessments for other consequential activities not clearly currently covered under anti-discrimination laws, such as marketing campaigns for financial products or search results based on a person’s name or occupation, as well as for vulnerable classes not clearly protected by current law, such as sexual orientation and gender identity.

Recommendations for government

The measures described in the previous section can be done on a voluntary, market-oriented basis, without the need for government to do anything at all. What can government do?

Government agencies, in cooperation with industry, can also conduct disparate impact studies in areas within their jurisdiction. In one recent study, the Federal Reserve Board showed that credit scores are systematically lower for African Americans and Hispanics who, as a result, are denied credit and receive higher credit rates. Additionally, the Federal Trade Commission showed that African Americans and Hispanics receive systematically lower credit-based automobile insurance scores, and so face higher automobile insurance premiums.

However, agencies might not have the technical expertise to assess decision-making systems within their area of jurisdiction that use the newest machine learning techniques. As suggested by the Obama administration, “[Government agencies] should expand their technical expertise to be able to identify practices and outcomes facilitated by big data analytics that have a discriminatory impact on protected classes, and develop a plan for investigating and resolving violations of law in such cases.”

The Trump administration is moving in the same direction by encouraging federal agencies to examine both regulatory and non-regulatory options for dealing with AI issues in the areas under their jurisdiction. It will soon release an executive order detailing this approach to AI issues and regulation.

Other agency actions might include using resources and industry relationships to facilitate company conduct of disparate impact assessments, provide incentives for developers to conduct these assessments for their products, encourage developer interaction with outside auditors, and push for disclosure of information for independent disparate impact audits.

Agencies can only act under their existing authority. But to make real progress, Congress should consider new legislation that would require companies to perform disparate impact assessments. The reduced exposure for disparate impact violations as interpreted by the courts and the difficulty of enforcement for algorithmic systems have lowered the incentive for developers to produce these assessments for defensive reasons. If policymakers want developers and users to conduct these assessments in frequency proportionate to their risk of increasing protected class disparities, they will need to require them.

The Algorithmic Accountability Act does this, but it puts significant new responsibility for these assessments in the hands of the Federal Trade Commission. A different approach would be to amend the existing anti-discrimination statutes to require disparate impact assessments for automated decision systems used in the contexts covered by these laws. The assessments should be provided to the appropriate regulatory agency charged with enforcing the anti-discrimination laws and to the public. Each agency could also be assigned the responsibility to conduct its own disparate impact assessment, and to have new authority, if necessary, to obtain data from developers and companies for this purpose. Agencies could also be authorized to work with outside researchers to conduct these assessments, and to approve certain researchers to receive data from developers and companies to conduct these assessments. Finally, the agencies might be required to work with developers and companies to determine which data might be revealed to the public at large in ways that would not compromise privacy or trade secrets so that independent researchers could conduct their own assessments.

“Longer term, an improvement in the accuracy and fairness of algorithmic systems depends on the creation of more adequate datasets.”

Longer term, an improvement in the accuracy and fairness of algorithmic systems depends on the creation of more adequate datasets, which can only be done through real-world action. But the creation of new and more adequate data might involve expensive data gathering that would not have a private-sector payoff. It might also involve increasing benefit eligibility under existing algorithmic standards in order to reach qualified members of protected classes. For instance, credit companies might extend loans to members of protected classes who just miss traditional eligibility standards and examine the results to better identify creditworthiness.

There are limits, however, to resolving longstanding disparities through adjustments in algorithms. As many analysts have pointed out, substantive reforms of housing policy, criminal justice, credit allocation, insurance, and employment practices, to name just a few, will be needed to reduce widespread inequities that have persisted far too long.[16] Good, fairness-aware algorithmic practices can go only so far in accomplishing the work of social and economic justice.

Conclusion

The promise of automated decision systems—and especially the new machine learning versions—is to dramatically improve the accuracy and fairness of eligibility determination for various private-sector and public-sector benefits. The danger is hidden exacerbation of protected-class disparities.

The way forward is to look and see what these systems are doing. As Louis Brandeis said, “Sunlight is the best disinfectant.” We cannot take steps to remedy potential bias in these systems if we do not examine them for discriminatory effects. Every business manager knows that it is impossible to manage what you do not measure. So, the first essential step is to measure the extent to which these systems create disparate impacts.

This recommendation is for the tradition of disclosure and assessment as the way to improve the operation of organizational systems. Progress in eliminating protected-class disparities first needs awareness. Difficult conversations might lie ahead in determining what to do if these assessments reveal disparities. But if we do not face them directly, they will only get worse and will be all the more damaging for being done in secret.


The Brookings Institution is a nonprofit organization devoted to independent research and policy solutions. Its mission is to conduct high-quality, independent research and, based on that research, to provide innovative, practical recommendations for policymakers and the public. The conclusions and recommendations of any Brookings publication are solely those of its author(s), and do not reflect the views of the Institution, its management, or its other scholars.

Microsoft provides support to The Brookings Institution’s Artificial Intelligence and Emerging Technology (AIET) Initiative, and Amazon, Apple, and Facebook provide general, unrestricted support to the Institution. The findings, interpretations, and conclusions in this report are not influenced by any donation. Brookings recognizes that the value it provides is in its absolute commitment to quality, independence, and impact. Activities supported by its donors reflect this commitment.