The costs of misusing test-based accountability in schools

Students take an examination on an open-air playground at a high school in Yichuan

In his new book, “The Testing Charade: Pretending to Make Schools Better,” Daniel Koretz, a Harvard professor with a career studying educational testing, implores us to consider what decades of test-based accountability have done to U.S. public schools. His assessment isn’t pretty. Koretz describes an education system that has organized itself in almost every way to maximize test scores, often at the expense of the students it serves.

Koretz sees the challenges that have emerged from today’s test-based accountability system as a textbook application of Campbell’s law. In essence, Campbell’s law suggests that the more we emphasize student test scores in accountability, the more we should expect two types of negative consequences: people who feel accountable for test scores doing things we don’t want them to do, and test scores becoming poor, inflated measures of what they were created to show.

Researchers have documented evidence of each:

  • Inappropriate test preparation, including coaching students to receive misleadingly high scores and, with alarming frequency, unambiguous cheating.
  • Strategic manipulation of which students are tested and, of those tested, which students are most prioritized for accountability purposes (“bubble students”).
  • Reallocation of time between subjects, emphasizing subjects that are consequential in test-based accountability at the expense of those that are not; reallocation of time within subjects, emphasizing aspects of a subject that are heavily tested at the expense of those that are not.
  • Gains on high-stakes tests outpacing gains on low-stakes tests, suggesting that high-stakes tests overstate students’ true mastery of the tested domains.

The list goes on. For example, Koretz laments what an unrelenting focus on state tests has done to teaching itself. He argues that test score pressures render many kinds of desirable instruction an act of self-sacrifice by teachers, who know that the clearest path to better test scores typically doesn’t include the most dynamic, engaging lessons. In essence, we are placing too much emphasis on test scores, with too much pressure to make improbably large gains. As a result, many schools cut corners–and the ones that don’t risk looking bad in comparison.

“Walk in to almost any school, and you will enter a world that revolves around testing and test scores, day after day and month after month” (Koretz, “The Testing Charade,” p. 21).

Moreover, since we use the same tests for many conflicting purposes—for example, holding teachers accountable and providing diagnostic information about student learning—we undermine the tests’ ability to serve any of those purposes well.

Koretz’s critique of today’s test-based accountability is broad, harsh, and damning, but glimpses of optimism shine through. Most explicitly, he makes specific recommendations for a test-based accountability system that would preserve and enhance the benefits of today’s accountability while mitigating its costs. More subtly, his desire to unshackle the education system from an obsessive focus on tests reflects a basic optimism about what would emerge in its absence.

Seeing Koretz tear down a system built on standards and tests leaves one wondering—or remembering—how schools would look in their absence. I found myself excited about what was possible. I also found myself nervous about it. Where Koretz worried about what material Algebra I teachers were omitting when their Massachusetts school district told them which content would be tested, I worried about what material they might omit without that guidance and incentive. Where he provided examples of wonderful instruction from teachers who courageously ventured off script, I wondered what would happen to kids whose teachers aren’t likely to handle that freedom well—and whether disadvantaged students would suffer from a reform that could amplify differences in teacher experience and quality. Where he warned of the dangers of schools focusing on tests, I thought about the dangers of the public losing its ability to see school-level test scores, especially for disadvantaged groups.

Although these are the questions conjured up by the book’s first several chapters, they are not the system that Koretz ultimately asks us to imagine. He offers a set of recommendations that will strike most readers as sensible, if unsurprising. He talks about broadening what we measure and how we measure it, and—in a particularly compelling recommendation—treating tests as a starting point for evaluating schools (e.g., for determining which schools require a closer look) rather than the end-all of evaluations.

Among these recommendations are some treats—ideas that seem clever and unfamiliar, at least to one who isn’t immersed in educational testing. For example, Koretz argues for the need to decrease the alignment between standards, curriculum, instruction, and tests, in part to make tests less predictable. He argues for training inspectors to look for inappropriate test preparation in order to introduce counterbalancing incentives into a system in which every actor, from teachers to members of state boards of education, has reason to want higher scores.

In sum, “The Testing Charade: Pretending to Make Schools Better” provides a thoughtful, accessible critique of the most powerful force in U.S. education policy today: test-based accountability. I suspect it will leave many readers, as it left me, frustrated that we have the test-based accountability system we have today—hampered by what seem, in hindsight, like such predictable and avoidable problems.

At the same time, I finished the book believing that test-based accountability, in principle, survived this critique from one of its most knowledgeable and insightful critics. As misguided as test-based accountability has been over the last couple of decades, a smarter and more restrained approach—one that attends to the realities of Campbell’s law, ultimately doing more good than harm—remains within reach.