Designing accountability systems to avoid NCLB-era mistakes

Editor’s Note: This post originally appeared on the Thomas B. Fordham Institute’s Flypaper blog.

I walked away from Fordham’s School Accountability Design Competition on February 9th pleasantly surprised—not only at the variety of fresh thinking on accountability, but also at how few submissions actually triggered the “I think that’s illegal” response. I left encouraged at the possibilities for the future.

The problem of one system for multiple users

Having done some prior work on school accountability and turnaround, I took great interest in the designs that came out of this competition and how they solved what I’m going to call the “one-system-multiple-user” problem. Though the old generation of systems had many drawbacks, I see this particular problem as their greatest flaw and the area where states will most likely repeat the mistakes of the past.

Basically, the one-system-multiple-user problem is this: The accountability design is built with a specific objective in mind (school accountability to monitor performance for targeted interventions) for a single user (the state education office); but the introduction of public accountability ratings induces other users (parents, teachers, district leaders, homebuyers, etc.) to use the same common rating system. Where the problem comes in is that not all user groups have the same objective; indeed we expect them to have different purposes in using the ratings.

Why is this such a problem? I see at least three reasons. First, it reinforces a longstanding confusion between disadvantaged students and low-performing schools, and puts the state’s seal of approval on it. This confusion leads to the second factor: Other users take the state’s ratings to make their own decisions, which are now less than ideal because of the confusion about what information the ratings actually convey. Incoming residents shop for homes with the best schools, for example, and well-meaning district leaders import practices from high-performing schools to low-performing ones. But these other users fail to recognize that they are implicitly shopping for more affluent neighborhoods or importing practices across two very different student populations that may not be the best medicine. Finally, as educators and parents recognize the dissonance between the state ratings and what they actually care about in the school, they learn to distrust the rating. It’s possible that this distrust of the rating could lead to distrust in the state itself and a suspicion of its efforts to improve schools overall.

Building a system for multiple users

We can build a better system by anticipating different users and providing the information they need. Most of the other system users care principally about how well the kids in a school are learning. In other words, they prioritize performance growth where most state systems are built to prioritize performance levels (rightly so, as the state wants to intervene where students need it most). Some states have adopted growth measures into their accountability systems by combining levels and growth into a single rating; this combination can help mitigate, but does not remove, the conflation of poor students with bad schools. Other users also care about things beyond proficiency rates and academic growth, so providing multiple data points on how schools are performing on an array of non-tested dimensions is valuable.

To avoid the one-system-multiple-user problem in the next generation of accountability measures, I recommend that states evaluate schools on performance levels and growth measures separately, rather than combining them for a single summative rating. Progress measures for English language learners and non-tested school outcomes could also be nested (or not) within the growth measure component at the state’s discretion (though I recommend choosing non-tested measures with caution and consider adjusting them for student background characteristics.)

Instead of a one-dimensional school grade, final ratings could have two parts—a letter grade (reflecting the proficiency levels) and a direction-of-progress component (indicating whether a school is improving or falling behind based on growth, and perhaps other measures). For example, a school may earn a “B-Falling Behind” rating, or an “F-Shooting Ahead” rating. With a system that cleanly separates these two components, the state still targets its interventions appropriately (in the failing schools, prioritizing those that continue to fall behind first), while other users see which schools are actually delivering on their mandate to teach kids.

Several of the submissions to the design competition had these levels and growth components estimated and reported separately, which is encouraging, though I want to single out three particularly innovative designs in this regard:

Dale Chu and Eric Lerum’s design had a two-component final rating, intended to function roughly as I described. However, their rating uses pluses and minuses (as in C-plus); I’d discourage using this notation in order to avoid the temptation to put these measures onto a single performance index. This could lead some users to assume a B-minus is better than a C-plus, which is the wrong message and undermines the value of the two-component framework.
Polikoff, Duque, and Wrabel’s design computes four components and does not recommend combining them at all. Rather, individual users can use them and combine them for their own purposes (e.g., the state combines achievement levels and growth for targeted turnaround, district leaders use the ELS and “Other” components to help diagnose problems and provide formative feedback for improvement).
The Teach Plus team’s design had two types of measures in their system: tier one measures, used for accountability, and tier two measures, intended to be reported for transparency and information only. Though this team’s tier-one design combined levels and growth (and didn’t really solve the multiple-user-one-system problem, in my mind), their presentation of two tiers of performance data was innovative. As districts collect and report this type of data, all users can be better informed about the quality of their schools, which better enables educators and the public to assist the state in promoting school improvement.

To conclude, I encourage policymakers and stakeholders to pay attention to the multiple-user problem and think carefully about how we can build a system that accommodates multiple uses and therefore empowers all users who have an interest in promoting school quality.

The Brookings Institution is committed to quality, independence, and impact.
We are supported by a diverse array of funders. In line with our values and policies, each Brookings publication represents the sole views of its author(s).

Designing accountability systems to avoid NCLB-era mistakes

Subscribe to the Brown Center on Education Policy Newsletter

Designing accountability systems to avoid NCLB-era mistakes

Michael Hansen Michael Hansen Senior Fellow - Brown Center on Education Policy

The problem of one system for multiple users

Building a system for multiple users

Michael Hansen