Disproportionality in student discipline: Connecting policy to research

Major racial disparities in student discipline rates have been documented for decades. Most recently, the 2013-14 Civil Rights Data Collection (CRDC) documented that black students, who make up 16 percent of enrollment, accounted for 40 percent of suspensions nationally. The Obama Administration made these disparities a major policy priority, expanding the CRDC and releasing policy guidance on discrimination in school discipline. Betsy DeVos told reporters just two months ago that she is “looking closely” at the rules, and advocacy efforts on both sides are heating up.

At the same time, the research base is growing. At the start of 2017, Steinberg and Lacoe published a useful overview of what we do—and more of what we don’t—know about school discipline reform. New evidence from Louisiana and North Carolina bolsters the case for the premise of the guidance, suggesting that more severe disciplinary outcomes for black students are due in part to discriminatory practice, intended or not. The guidance suggests reducing the use of suspension: research on Philadelphia’s recent efforts to do so points to major implementation challenges.

I first describe the policy debate, then how these new studies inform it.

The policy context: federal guidance on disproportionality in school discipline

In 2014, the Department of Education and Department of Justice jointly issued a Dear Colleague Letter (DCL) on racial disparities in school discipline. The Department of Education is now considering pulling this guidance document, prompting enormous debate in the education policy world.

The letter states its aim as helping public schools administer discipline without discriminating on the basis of race. It then summarizes recent racial disparities in discipline, as reported in the Civil Rights Data Collection (CRDC). It acknowledges that different rates of discipline per se do not reveal discrimination, citing scholarly research and past investigations by the Departments to support the assertion that the disparities “are not explained by more frequent or more serious misbehavior by students of color.”

Instead, it concludes, “racial discrimination in school discipline is a real problem.”

The guidance describes the potential for exclusionary discipline (e.g., suspension and expulsion) to reduce instructional time and lead to adverse outcomes. It correctly describes correlations between student experience of exclusionary discipline and later negative outcomes. While the guidance does refer to the “school-to-prison pipeline,” it avoids causal language in its summary of the literature.

The bulk of the document explains the two relevant legal concepts, different treatment and disparate impact, with examples. The letter then explains the investigative process, emphasizing the importance of record keeping, and describes potential remedies that could result.

Finally, the letter contains a significant appendix of “illustrative” specific suggestions for policy and practice that could serve to help states and districts avoid violations, urging schools to reduce the use of suspension and other forms of exclusionary discipline, focusing instead on positive approaches.

I first discuss how recent research relates to the argument over the existence and extent of discrimination, and then to the controversy over the practice recommendations.

Empirical challenges in documenting discrimination in student discipline

Quantifying discrimination is a notoriously difficult empirical task: researchers essentially attempt to exhaust all other plausible explanations for observed gaps in outcomes across whatever groups are involved. The most convincing evidence of discrimination in other contexts, like labor and housing markets, tends to come from experimental methodologies in which researchers can manufacture identical cases to compare, differing only along the dimension being examined for potential discrimination, such as race or gender.

Researchers seeking to understand how much of the race gap in student discipline may be attributable to discrimination face a more complex version of this challenge. One key missing variable is actual student behavior: researchers observe only the infraction as recorded by school personnel, who could exhibit bias in how they map behavior to infractions even if not in how infractions map to punishment. Further complicating matters, the school environment itself influences behavior. So even if researchers with access to exceptionally rich data were to conclude that gaps in discipline were fully explained by gaps in behavior rather than simply recorded infractions, they would not necessarily be able to rule out discrimination causing those gaps in behavior in the first place.

The DCL cites multiple studies in its assertion that differences in behavior do not explain the gap in disparities. The work of Skiba et al. (2011) is among the most convincing cited. The authors draw upon administrative data from 436 schools across the country in 2005-06, looking at differences in discipline for “minor misbehavior.” They find black and Hispanic students were more likely to be disciplined conditional on receiving a referral for “minor misbehavior” than were their white peers. Steinberg and Lacoe note a shortcoming of this work is its inability to control for students’ prior disciplinary records, suggesting that might explain justifiable differences in severity of punishment for similar infractions—in this case, the results would overestimate discrimination’s impact. But one could also argue that the behavior for which the student was referred was itself an outgrowth of discrimination, and that Skiba et al. underestimate the role of discrimination.

The most convincing evidence that discrimination contributes to the gap may well come from work on implicit bias more generally, rather than specifically in a school discipline context; it is difficult to imagine that bias applies in the range of educational contexts documented but not in the realm of discipline. This literature does not give a clear indicator of just how much of the gap might be explained by bias, however. And in the context of broader racial inequality, it is unlikely that student behavior would be identical across racial groups even without discrimination.

As much as empirical challenges may seem to render this debate theoretical, it is a critical one for policy: the authority to issue federal guidance to schools on discipline disparities comes directly from administrative authority to enforce the Civil Rights Act, which requires either different treatment or disparate impact. The DCL notes that the Departments’ investigations of civil rights violations have revealed discrimination in student discipline—and that these investigations use aggregate gaps alongside an array of other evidence. Given the inherent difficulties in large-scale quantitative research on this topic, this internal experience may provide stronger evidence of discrimination than the research literature.

According to the DCL, “The Departments also may initiate investigations based on public reports of racial disparities in student discipline combined with other information.” It is therefore important to understand not just whether discrimination exists, but how it relates to aggregate racial gaps in discipline rates in this legal context.

New updates to the literature on discrimination in discipline

New studies from 2017 provide further suggestive evidence that discrimination contributes to the discipline gap. Like the literature upon which they build, they do not come close to suggesting that discrimination is responsible for the entire gap.

Barrett, McEachin, Mills and Valant’s new Louisiana study draws on statewide student-level data from 2000 to 2013. Much of what they find corroborates existing empirical work, with the same caveats in interpretation: black students are more likely to be suspended, even conditional on eligibility for free or reduced-price lunch. Black students are 10 percentage points more likely to be suspended in a given year, and low-income ones 6 percentage points likelier, controlling for school-grade-year fixed effects. That is, a poor black student is 10 percentage points likelier than a poor white student in the same school, grade-level, and year to be suspended; he is 16 percentage points likelier than a white student who is not free lunch eligible. In other words, racial disparities are not solely a function of differences in family income by race. This finding is broadly consistent with Skiba et al. (2002), who studied a large Midwestern district, and with Raffaele Mendez and Knoff (2003), who studied a Florida district, both in the mid-1990s.

Barrett et al. conduct a similar exercise predicting the length of a suspension, in days. Black students were predicted to have an additional .099 days per suspension, off a base of 2.9 days as the mean suspension for whites. Previous work in Arkansas, also controlling for school fixed effects, estimated that black students received about an additional .07 days per suspension. The authors then attempt to get closer to studying disparities in discipline conditional on student behavior by comparing outcomes for black and white students who participated in the same fight. They find this cut the additional days of suspension predicted for black students roughly in half; black students still received slightly longer suspensions, by about .04 to .05 days, than their white counterparts in these cases. This result is small in magnitude but statistically significant.

Using student-level data, they are able to surmount a criticism of the Skiba et al. study by limiting the analysis of students involved in the same fight to a sample in which neither student had a previous suspensions—so the only comparisons involve students with similar disciplinary records. They obtain qualitatively similar results for this sample, suggesting that discrimination rather than unobserved differences in student experience drives the small but statistically significant result. Some advocates would argue this work provides a higher level of causal evidence on the existence of discrimination; others would argue it shows that discrimination only explains part of the gap. Indeed, both interpretations are correct.

The statewide, student-level sample for the Louisiana study further allows Barrett et al. to go on to decompose variation in disciplinary outcomes by what they find statewide (across all districts), within a district, and within a school. They find over half the gap in racial discipline rates is generated within schools; that is, it is not simply a story in which black and white students attend schools with different discipline policies. This is notably different from the previous work on Arkansas.

Additional suggestive evidence on the role of discrimination in the gap comes from Lindsay and Hart (2017). They find black students in North Carolina were less likely to be subject to exclusionary discipline when they had black teachers rather than white teachers, even within the same school. This was true for suspensions and expulsions, and the result was especially strong for office referrals for “willful defiance”—a more subjectively defined infraction.

The DCL’s policy and practice recommendations

Opponents of the guidance have focused more on its recommendations than its premise that schools discriminate. The Appendix lists many recommendations for policy and practice, with the general theme of reducing the use of exclusionary discipline (suspension and expulsion) and providing discipline through positive approaches. While the DCL relies heavily on research to support its assertion that disparities are a result of discrimination, it does not do so in its policy and practice recommendations. It emphasizes that the recommendations are illustrative examples and not mandates, and notes these recommendations “are based on a review of a broad spectrum of our cases.” Recommendations include positive behavioral supports, restorative practices, and limiting the use of out-of-school suspension to infractions that threaten school safety.

The Appendix outlines a number of ideas with popular support, but a limited research base to date. And there are a number of questions to consider for any policy recommendation. Given the specific regulatory context, the question at hand is whether the approach will reduce discrimination. But in the broader policy context, we need to ask how its costs and benefits compare to each other, and to those of alternate approaches. The Departments can suggest policies as a means of complying with the legal mandate of nondiscrimination, but schools, districts, and states must also consider the needs of all their students. These needs would suggest examining metrics such as achievement and attendance outcomes, not only for students who themselves are referred for disciplinary action, but also for their peers.

In their review piece, Steinberg and Lacoe conclude there is insufficient evidence to determine the efficacy of common practices and policies proposed in discipline reforms, in large part because they have been implemented so recently.

New research on implementation challenges

While research on the effectiveness of disciplinary practices is scant, recent studies shed light on the difficulty of actually implementing changes in these practices. As the evidence base for such practices hopefully grows over time, it will be important for policymakers to realize the obstacles they must overcome if they seek to achieve impacts estimated in other contexts. In particular, schools will need to adopt alternative practices, rather than eliminating or dramatically reducing the use of exclusionary discipline in a policy void. Because the Civil Rights Data Collection quantifies the use of exclusionary discipline, it becomes a salient metric for researchers, advocates, and policymakers—but on its own, it does not paint a complete picture of a school’s approach to discipline or, more importantly, its overall climate.

Two recent studies of discipline reform in Philadelphia point to the importance of school context in implementing a district-mandated discipline reform, and to the importance of evaluating implementation alongside outcomes. In a new study, Steinberg and Lacoe use data from Philadelphia schools, relative to counterparts elsewhere in Pennsylvania over the same time period, to study a 2012 district reform restricting the use of exclusionary discipline and emphasizing positive approaches. They find only about a fifth of schools fully complied with the district’s policy. Three out of five schools reduced their use of suspensions for “conduct” offenses (but did not eliminate them as mandated), and another fifth of schools increased their use of conduct suspensions compared to baseline. Overall, the policy reduced absenteeism by about a day and a half per year among previously suspended students, and did not affect their test scores. Notably, after an initial dip, the policy coincided with an increase in the district’s racial suspension gap.

Steinberg and Lacoe then examine student outcomes for the different types of schools separately. In those schools that completely cut out conduct suspensions—in their words, the schools that “fully complied” with the reform—outcomes did not change for those students who had not been suspended in the earlier regime. That is, keeping the students who would likely have been suspended in the prior regime in the classroom did not adversely affect their peers.

But in the majority of schools that did not fully comply with the reform, attendance and achievement of previously suspended student rose, while attendance and achievement of the peers of previously suspended students fell slightly. This could reflect increased classroom disruption, consistent with critical accounts of experiences in some other districts attempting to reduce suspension rates. Some advocates will chalk up these effects to improper implementation; others will conclude these implementation efforts are what we should expect when reforms are pushed through without local buy-in or additional resources. In any case, the Philadelphia results point to the importance of looking at a range of outcomes beyond discipline rates by race, and to allowing for heterogeneity at the school level when studying district-level reforms.

To think more deeply about these results, it is helpful to think about what determined school-level compliance. A study from the Consortium of Policy Research in Education (CPRE) of Philadelphia schools after the reform found that schools using positive rather than punitive disciplinary measures had more faculty cohesion, better teacher morale, and served higher socioeconomic status students than schools not complying with the reform. It is unclear how complying with the reform would have affected students in other schools with greater implementation challenges. The CPRE study noted school personnel systematically reported lack of staffing and space as key barriers to implementing the discipline reform—when they needed to remove students from the classroom but lacked staffing and space for in-school options, they continued to rely on out-of-school suspension. Interestingly, whether a school officially participated in a district program for Positive Behavioral Interventions and Supports was unrelated to its implementation of the new suspension guidelines.

Going forward: Building a policy-relevant evidence base

As more and more districts and some states change their discipline policies, we have the potential to learn a lot. The Philadelphia experience serves as a useful case for researchers, policymakers, and practitioners looking forward.

Regardless of what we learn about promising interventions (for example, to mediate educators’ implicit biases, or for positive behavioral supports), these interventions need to be implemented at the school level to work. And it may be impossible to fully implement some policies in the context of existing resource constraints. The CPRE study’s interview findings about the importance of staffing and space in adjusting disciplinary policy seem obvious ex post, but many districts are changing their policies without addressing these first order issues.

Researchers should consider the importance of implementation research as an integral part of discipline reform program evaluation. And policymakers who want to see reductions in the use of exclusionary discipline—and improvements in the more fundamental problems for which it can be a symptom—should consider the importance of real resources, both for generating a robust evidence base and helping districts act on it.

The author did not receive any financial support from any firm or person for this article or from any firm or person with a financial or political interest in this article. She is currently not an officer, director, or board member of any organization with an interest in this article.