Harnessing the value of “failure”

According to the Coalition for Evidence-Based Policy, of the 77 educational interventions evaluated by randomized control trials (without major study limitations) commissioned by the Institute for Education Sciences (IES) since its inception in 2002, only 7 (9%) were found to produce positive effects.[i] This is true despite the fact that most of these interventions had shown promising results in earlier research. And statistics like these likely overstate the fraction of interventions found to be successful due to publication bias – that is, the tendency for studies that find no effect to be dismissed by the researchers and/or rejected by a journal.

Interestingly, the high “failure rate” for well-designed studies in education is similar to the rates in other fields, including late stage clinical trials of pharmaceuticals and A-B testing in business.[ii] Nonetheless, education researchers, practitioners and policymakers often (and understandably) feel stymied by these facts. Given the importance of education as a vehicle of social mobility and driver of economic growth, along with the fact that we spend hundreds of millions of dollars on education research[iii] in the U.S. each year, it is imperative that we do more to learn from these failures.

So, what do we know about failure?

Past scholars have pointed out that education studies tend to find smaller effects if they utilize a rigorous research design, involve a large number of students or schools, target a more disadvantaged group or use a common standardized outcome measure (rather than an assessment tied more closely to the intervention).[iv]

More recently, researchers at Harvard University and the University of Michigan have started to study the nature of failure more systematically. Indeed, they held a “Null Results” conference this fall which brought together people working in various areas to discuss reasons for and implications of the altogether too common scenario of a zero effect.[v] Among other things, they emphasize that poor implementation often underlies zero effects.

This is consistent with my own experience. I recently completed a randomized evaluation of a professional development program aimed at training teachers in effective strategies for early reading instruction.[vi] The program is quite popular in Michigan, where hundreds of schools have utilized it over the past decade. And yet students whose teachers were randomly assigned to receive the training did no better on measures of reading performance than their peers in the same school whose teachers did not receive this training. In seeking to explain these results, school and program staff noted the challenge of implementing substantive pedagogical reform in the chaotic environment of an urban school where many other things compete for the time and attention of teachers and administrators.

So, where should we go from here?

To begin, we should take lessons from colleagues in other fields. In psychology, there is a tradition of individual studies building on each other in a systematic way that is often absent in education research. Indeed, articles in a psychology journal often contain the results of multiple lab experiments that together yield a valuable insight. In public health, implementation challenges are at the core of intervention and research efforts.

In order to maximize what we can learn from failure, all studies should collect data that allows one to understand issues of implementation, examine potential mediators and moderators, calculate the cost of the program and, whenever possible, track long-run outcomes.

Finally, we should consider more fundamental changes in the nature of education research. Education researchers should think more like engineers than bench scientists, working closely with practitioners and program developers to solve problems and not simply test theories. Importantly, researchers must be committed to helping practitioners adapt interventions and then re-evaluate them.

These ideals are reflected in a tradition of education research known as “design studies.” More recently, Tony Bryk and colleagues at the Carnegie Foundation for the Advancement of Teaching have been promoting a similar approach that they describe as “Improvement Science.”[vii]

These approaches are subject to important critiques themselves, perhaps most importantly that there is no evidence that they are effective at improving student learning at scale.[viii] However, they provide valuable insight regarding ways to design, develop and refine school-based interventions.

For their part, developers should design programs with an eye toward scalability. This means anticipating and planning for the chaos and lack of administrative support that often exists in high poverty school settings. And funders should encourage programs of research, with multiple studies that build on each other, rather than one-off projects.[ix]

To be clear, when it comes to determining “what works,” there is no substitute for a well-done randomized control trial (RCT). The education research community has made tremendous progress over the past 10-15 years in differentiating between correlation and causation. We cannot go back to a situation when schools, districts or states implement new policies or mandate specific interventions on the basis of anecdote or fad. But hopefully a new approach to design and development will ensure that more evaluations yield positive effects.

When a comprehensive evaluation of a promising intervention finds no effect on student outcomes, it is discouraging for practitioners and researchers alike. There is a tendency for all parties to dismiss or rationalize the bad news, or to simply “move on” to other projects. But if we want to make progress in determining what does work, we need to spend time examining what doesn’t work.


[ii] For a non-technical discussion of related issues, see: Manzi, Jim (2012). Uncontrolled: The Surprising Payoff of Trial-and-Error for Business, Politics and Society. New York City, New York: Basic Books.

[iii] In recent years, the budget of the Institute for Education Sciences alone was roughly $600 million.

[iv] Robert Slavin and colleagues identify these factors in a series of research reviews. See, for example, Slavin et al. (2009), “Effective Reading Programs for the Elementary Grades: A Best Evidence Synthesis, Review of Educational Research, 79(4): 1391-1466.

[v] Full disclosure: My wife, Robin Tepper Jacob, was one of four conference organizers, along with Heather Hill, James Kim and Stephanie Jones from the Harvard Graduate School of Education.

[vi] Jacob, Brian A. (2015). “When Evidence is Not Enough: Findings from a Randomized Evaluation of Evidence-Based Literacy Instruction (EBLI).” National Bureau of Economic Research, Working Paper 21643.

[vii] For a good summary of design studies in education, see the special issue of Educational Researcher published in January 2003. For a description of the Carnegie Foundation’s recent work, see: Anthony S. Bryk, Louis Gomez, Alicia Grunow and Paul LeMahieu. Learning to Improve: How America’s Schools Can Get Better at Getting Better. Harvard Education Publishing, 2015.

[viii] Another concern with these approaches is that close collaboration between developer and researcher might compromise the objectivity of the researcher. For a cogent critique of the earlier design studies, see Shavelson, R. J., Phillips, D. C., Towne, L., & Feuer, M. J. (2003). “On the science of education design studies.” Educational researcher, 32(1), 25-28.

[ix] The goal structure of research grants awarded by the Institute of Education Sciences (IES) – with development, efficacy and evaluation awards – is a good first step in this direction.