Are criminal risk assessment scores racist?

Imagine you were found guilty of a crime and were waiting to learn your sentence. Would you rather have your sentence determined by a computer algorithm, which dispassionately weights factors that predict your future risk of crime (such as age or past arrests) or by the subjective assessment of a judge?  And would that choice change if you were a different race?

Technology is often held up as a way to reduce racial disparities in the criminal justice system. If existing disparities are due at least in part to the racial biases of witnesses, police, and judges, then replacing some human judgments with computer algorithms that estimate crime risk could produce a fairer system. But might those algorithms also exhibit racial bias? This is a good question, but not one that’s easy to answer using existing data.

ProPublica story in May claimed that the risk scores calculated by the private firm NorthPointe are plagued by racial bias, systematically giving higher risk scores to blacks than to otherwise similar whites. If true, this is an important problem, as courts all over the country use risk scores to determine bail, sentencing, parole, and more. This study has received considerable media attention and was even cited by the Wisconsin Supreme Court in a legal decision limiting the use of risk assessments. Yet it contains several important errors of reasoning which call the conclusions into doubt.

The ProPublica study cites a disparity in “false-positive rates” as evidence of racial bias: black defendants who did not reoffend were more likely to have been classified as high risk than white defendants. As noted in NorthPointe’s response, and explained in a recent column by Robert Verbruggen, these statistics are dangerously misleading. Any group that has higher recidivism rates (and therefore higher risk scores, on average) will mechanically have a higher false positive rate, even if the risk score is completely unbiased.

(The basic intuition is this: the false positive rate is the number of people labeled high risk who don’t reoffend divided by the total number of people who do not reoffend. In a group with high recidivism rates, the numerator will be larger because the pool of people labeled high risk is bigger and the denominator will be smaller because there are fewer people who do not reoffend. The result is that the ratio of these numbers is always larger than it is for low-recidivism groups.)

It seems counterintuitive, but disparities in false positives don’t tell us anything about racial disparities in the algorithm. Disparate false-positive rates will be present every time there are disparate rates of reoffending, regardless of racial bias and regardless of whether the risk score is made by a computer algorithm or by the subjective assessment of a judge.

However, even if we look at the correct set of numbers, we face a bigger problem: risk scores influence sentencing, and sentencing influences recidivism. Consider a defendant who is ordered by the courts to undergo substance abuse counseling due to his high risk score. If he doesn’t reoffend is this because the risk score was wrong—or because the substance abuse counseling was effective?  Consider a second defendant who received a prison sentence as a result of her high risk score. If she doesn’t reoffend is this because the risk score was wrong—or because she was in prison until she was too old for crime? Recidivism rates do not tell us what a person’s propensity to commit another crime wasat the time the risk score was calculated. And therefore they have limited use in determining the accuracy of those scores. A very nice paper by Shawn Bushway and Jeffrey Smith makes this point at length.

While methodological issues call ProPublica’s conclusions into question, potential racial bias in risk assessment remains an important issue. Close to 20 states are using risk assessment to help determine sentencing in at least some jurisdictions, and risk assessments in bail and parole are even more common. Considering that many of the inputs to risk assessments, such as past arrests, are subject to racially disparate policing practices, it would not be surprising if risk scores carried some of this bias over. These are complicated issues, and scholars such as Richard Berk and Sandra Mayson provide deeper analysis on how we should think of fairness and justice in risk assessment.  But one of the most important policy questions is simple: do risk assessments increase or decrease racial disparities compared to the subjective decisions of judges?

An ideal approach to answering this question would be an experiment in which some judges are randomly assigned to use risk assessments as part of their decisions (their defendants are the treatment group), and some judges to operate as before (their defendants are the control group).  We could then compare racial discrepancies in sentencing for the treatment group and the control group, to determine the effect of incorporating risk assessment scores in the decision-making process. We could use the same method to consider this policy tool’s effects on recidivism, incarceration rates, and any other outcomes we care about.

The expanding role of technology in criminal justice deserves a hard look and vigorous debate.  But condemning risk assessments as racist based on weak evidence does nothing to advance the cause of racial equality.