Sections

Research

Learning from teacher evaluations that work

David Blazar
david blazar headshot
David Blazar Associate Professor - University of Maryland

October 16, 2025


  • Evaluations of D.C. Public Schools’ high-stakes teacher evaluation system, IMPACT, show that while it improved overall performance, it also produced inequitable outcomes across teachers by race and experience.
  • Over time, DCPS adjusted its evaluation system to make incentives more equitable, particularly for teachers in high-poverty schools, which reduced—but did not eliminate—racial disparities.
  • The study finds that replicating DCPS’s success requires not only strong incentives but also an emphasis on equity in their design and a willingness to revise them when outcomes fall short.
A middle school principal observes and evaluates a teacher during an active classroom lesson.
A middle school principal observes and evaluates a teacher during an active classroom lesson. MB Images/Shutterstock

Over the past 15 years or so, modern-day teacher evaluation systems and their associated incentive schemes have had a tumultuous, even if brief, history. In 2009, the District of Columbia Public Schools made national headlines for implementing a particularly high-stakes system: The highest-performing teachers earned large, one-time bonuses and large increases in base pay, while the lowest-performing teachers were dismissed immediately and those scoring just above them were threatened with dismissal if they did not improve the following year.

In retrospect, assessments of D.C.’s teacher evaluation system, called IMPACT, have been bifurcated. From one perspective, rigorous quantitative explorations document large effects of DCPS’s evaluation system as a whole and of its individual (dis)incentives on overall school system performance. From another, the mayor and superintendent who designed and implemented the system were villainized by many constituents—including teachers and the local community—who argued that firing teachers was antithetical to school systems as learning spaces; that evaluation scores placed an outsized emphasis on student test-score performance; and allocation of the (dis)incentives was highly racialized. Although the teachers’ union initially partnered in the system’s design, it ultimately sued the district to challenge teachers’ low ratings and dismissals.

Retrospective assessments of teacher evaluation across the U.S. have been similarly mixed. On one hand, the test-case reforms in DCPS were promoted nationally under the Obama administration. Through Race to the Top, the administration’s incentive program, many districts and states followed D.C.’s lead, making substantial changes to their daily operations to include a high-stakes evaluation component for teachers. On the other hand, most of these districts and states have not seen similar benefits. DCPS is one of just six “special” sites identified by the National Council on Teacher Quality (NCTQ) post-hoc for its above-average effects, with some observers suggesting that union and community resistance may have constrained broader reforms.

So, where does this leave school districts and states deciding whether to abandon or double down on teacher evaluation as one of several possible systemwide reforms to improve teacher and student outcomes? In some ways, I agree with Eric Hanushek and Margaret “Macke” Raymond, who argued in a recent Washington Post op-ed that other districts should lean further into DCPS-style evaluation. At the same time, to achieve the kind of results DCPS has seen, we need a nuanced understanding of what has gone right there—and what has fallen short.

Learning from DCPS as a special case of teacher evaluation reform

Replicating DCPS’s successes requires looking inside the “black box” of teacher evaluation reform, which is no easy task.

However, through a research-practice partnership with the district—with a recent publication out in “Educational Evaluation and Policy Analysis“—colleagues and I examine 10 years of implementation across multiple teacher dimensions to understand how and for whom the DC evaluation system’s high-stakes incentives are working. Our paired descriptive and causal analyses identify several critical insights.

First, we found that allocation of the incentives varied substantially across teachers based on their experience and race, aligned closely with a separate internal equity review. Variation across experience levels is not surprising, as it is widely documented that veteran teachers are more effective than novices and, as such, likely should receive different consequences and rewards.

However, variation across race is both unexpected—given a growing literature on the outsized impact that Black and non-Black teachers of color have on varied student outcomes—and inequitable. For example, in the first several years of implementation, close to 4% of Black novices (i.e., through their fourth year teaching) were immediately dismissed after receiving the lowest evaluation score, and an additional 8% received a dismissal threat if they did not improve the following year (see Figure 1). In contrast, only 1% of white novices were immediately dismissed and 4% threatened with dismissal. At the high end of the performance distribution, close to 25% of white veterans received the highest evaluation score that came with large financial rewards, compared to just over 10% of Black veterans.

Figure 1

Second, we found that—despite these early inequities—patterns shifted substantially over time as the district shifted the contours of the evaluation system itself. Amongst several changes that started in the fourth year of implementation (i.e., 2012-13 school year), DCPS restricted eligibility for the base-pay increases to teachers working in high-poverty schools, where 60% or more of students were eligible for free or reduced-price meals. As a result, allocation of the base-pay increase became much more equitable across teacher racial groups (see Figure 2). White novices still were more likely than Black novices to receive the financial award (roughly 12% versus 5%), but the difference was much smaller than before. And, Black veterans were slightly more likely than white veterans to receive the salary incentive (12% versus 11%). While the updates to the incentive scheme did not have any direct implications for lower-performing teachers, we still observe a decrease over time in the share of teachers immediately dismissed or threatened with dismissal by race and experience (though racial disparities remained).

Figure 2

Third, based on these descriptive patterns and on multidisciplinary scholarship, we hypothesized that differences in the likelihood of receiving a (dis)incentive across race and experience would, in turn, lead to differences in how teachers responded. Intuitively, the psychology and economics literatures on incentives and motivation suggests that individuals with the greatest expectations of success are the most likely to change their behavior in response.

This is exactly what we found. To estimate causal effects of the two incentives—dismissal threats and salary increases—we applied a regression discontinuity design that compares teachers who just barely scored low enough on the evaluation system rating scale to receive a dismissal threat to other lower-performing teachers who just barely missed the threshold (and similarly for teachers near the eligibility threshold for salary incentives). This approach allows us to make what is essentially an “apples-to-apples” comparison.

We found Black novices, who were the least likely to reap the benefits of the evaluation system, did not respond in their subsequent performance to either incentive. Black veterans, who likely had higher expectations of success than Black novices, responded to both incentives to a moderate degree—though their response to the threat of dismissal was stronger than the financial salary incentive.

White novices, who we posit had some of the highest expectations of success in DCPS’s evaluation system, responded the most to dismissal threats. They improved in their average teaching performance by 0.6 SD—much more than Black veterans (roughly 0.4 SD). At the top end of the performance distribution, the response of white novices near the salary incentive threshold showed a decline in performance relative to the control group, which was quite different from that of Black veterans, who did improve in their performance when offered a financial reward. In other words, white novices appear to have taken advantage of the evaluation system’s requirement only to maintain their high performance, while Black veterans potentially improved as an extra safeguard for earning the reward.

Overall, we found no effects for white veterans. Despite being the most likely to benefit from the evaluation system, white veterans demonstrated a large degree of risk aversion by turning down the salary incentive offer—presumably because opting into it required giving up some job protections—at higher rates than other groups of teachers.

Looking back to look forward

The incentive effects we observe in DCPS are notable and echo scholarship and public discourse from many others hailing DCPS as a “special case.” At the same time, we find that the effects were uneven, driven primarily by white novices and to a lesser extent by Black veterans. Black novices experienced and responded to the evaluation system quite differently, and many white veterans responded by opting out.

We further show that DCPS’s changes to the evaluation structure created a more level playing field, at least with regard to eligibility for the salary incentive. While we do not know what the counterfactual would have been if DCPS had not made these changes, we hypothesize that effects on teacher outcomes might also have looked quite different (and more tempered, particularly for Black teachers). In my view, the critical lesson here is that states and districts that want to look more like DCPS need to think not just about the overall incentives within a teacher evaluation system but also about equitable distribution of them, particularly by race. These empirical findings align with public discourse in the district, especially with continued union-led proposals for changes to the evaluation system design.

In many ways, this takeaway is similar to policy implications derived in much earlier lines of research on teacher evaluation reform. After all, teacher evaluation is one of many education policies whose presence—and reputation—has shifted across generations. Today, teacher evaluation and incentive systems are often linked to “value-added” measures and teachers’ contributions to student test scores—a relatively recent development in education policy. But the core idea of supervising teachers through observations of classrooms, performance checklists, links to some sort of student outcome, and rewards or consequences as a result of performance goes back centuries.

And so too do its successes and failures, and associated controversies. For example, in the 1980s, Richard Murnane and David Cohen reflected on decades of attempts to enact and uphold merit-pay systems—a core incentive within teacher evaluation systems—arguing that “most merit pay plans fail and few survive” because some systems are “special” and most are not. In primary data analyses, they searched for U.S. school districts with enduring merit-pay schemes but could find only six that lasted longer than five years—almost all of which were small districts with homogenous student populations. Interviews with teachers and administrators in these districts revealed unique features that were quite contrary to the very nature of personnel and contract economics, upon which modern teacher evaluation is modeled: Make everyone feel special and make merit pay inconspicuous.

Notably, DCPS was not in the group of districts identified by Murnane and Cohen, its students are quite diverse, and its high-stakes incentives are far from inconspicuous. Yet the core structure of the evaluation system is enduring, now in its 17th year. The fact that today’s “special” districts—as identified by NCTQ and others—are different from before likely means that there is more variability in what can make for a winning formula. DCPS (and others) provide a sort of roadmap for other districts and states.

I would add the roadmap needs to include at least two critical components beyond the high-stakes incentives themselves: (1) attending to equity in the design of the incentive system and (2) being open to redesign if allocation of incentives does not go as planned at first. DCPS has attended to both in different ways over the years, though redesign efforts have received far less attention than IMPACT’s initial implementation. Both components are more likely to gain broad buy-in from key stakeholders (e.g., teachers and unions) that have thwarted implementation of teacher evaluation reforms in other contexts, as well as to bolster the effects of evaluation system incentives on key educational outcomes.

Author

The Brookings Institution is committed to quality, independence, and impact.
We are supported by a diverse array of funders. In line with our values and policies, each Brookings publication represents the sole views of its author(s).