This blog is part of a series examining big research questions related to the scaling process in education.
Lately, we’ve been thinking about how government decisionmakers around the world adopt education innovations at scale. Research shows that national and regional decisionmakers are continually calculating costs and benefits of whichever innovation they’re considering. There are always competing innovations and human and financial resources are always scarce, so innovators must be strategic.
One valuable, strategic tool is positive pilot data. And, yet, using data from a pilot study to promise an innovation’s success at scale can be tricky. This is due to inherent biases in piloting an innovation that can misrepresent the innovation at scale. These biases deserve a look.
Biases when piloting an innovation
One is the famed Hawthorne effect (named not for the person who coined it—Henry Landsberger—but for the name of the company he was studying: Hawthorne Electric Company). This is the tendency for people who know they are being observed to act better (like how employees work harder when they know the boss is watching). Participants who know they’re in an innovation pilot tend to perform better than they otherwise would. This can produce overstated data.
Another bias is the Rosenthal effect (sometimes known as the Pygmalion effect). The Rosenthal effect—named after psychologist Robert Rosenthal—is that people will live up or down to the expectations that others hold for them. An example is when students perform better if their teacher has high expectations for them and perform worse when the teacher’s expectations are low. Participants in pilot studies often know that the people piloting the innovation have high expectations for them and so, consciously or not, they will perform at a higher level. Again, this can lead to results that don’t necessarily translate for the same innovation at scale, where expectations may get diluted or wane.
Other limitations bringing innovations from pilot to scale
Aside from these two effects, there are other methodological concerns bringing an innovation from pilot to scale. Though they’ve been described by many researchers over the years, one helpful framework comes from Abhijit Banerjee and his colleagues who discuss six potential errors. One is market equilibrium: Per-unit costs of things change when an innovation is scaled; this alters the benefit/cost ratio. A second one they term positive spillover effects: their version of the Hawthorne effect. A third is political patronage: As an innovation expands, so do opportunities for corruption and collusion. The fourth is context dependence: There may be something unique about the context in which the pilot occurred that’s not true for the scaling context. The fifth is self-selection bias: Schools and local governments choosing to participate in pilot efforts are likely atypical—they’re reformers or people willing to work extra hard. And the sixth is pilot bias: The logistics and funding needs of a pilot are often easier to manage than administering the innovation at a larger scale.
We raise these areas of concern because they’re important for innovators who might use pilot data to present education reforms to high-level decisionmakers. We’re not saying it shouldn’t be done —just to be careful. As with so many things in life, perhaps underpromising and overdelivering is the key.
But it’s not that easy.
If promoting an innovation to national or regional decisionmakers is as much political as it is based on the quality of the innovation, there may be a desire to overpromise, just a little. Trumpeting positive pilot results is part of that, but one should know the methodological limitations.
While the pitfalls of piloting can’t always be avoided, they can be mitigated.
For implementers tasked with presenting an education innovation to national decisionmakers, this might mean being careful about presenting only positive pilot data—or only data of any kind. A full range of information and arguments for adopting the innovation is advised. And it’s good practice to align the innovation with the political priorities of the moment and the complex motivations of the decisionmaker. Understanding the context and using the right data in the right way are key.
There’s a lesson for researchers, too. While the pitfalls of piloting can’t always be avoided, they can be mitigated. Maybe the Hawthorne effect can be lessened by discussing it with participants at the outset. Folks can be encouraged to act as they might normally do, rather than artificially help the innovation along. Perhaps the Rosenthal effect can be addressed by pilot implementers taking care not to overtly or subtly communicate unusually high expectations to participants. The other issues, framed here by Banerjee and his colleagues, can be addressed in various ways—most importantly in what expectations the innovators, implementers, and pilot researchers themselves have for their pilot (and how they design the accompanying study). The point here is to address potential biases ahead of time.
The Center for Universal Education at Brookings is proud to be partnering with the Global Partnership for Education’s (GPE) Knowledge and Innovation Exchange (KIX), through the Research on Scaling the Impact of Innovations in Education (ROSIE) project, to explore scaling-related issues with national decisionmakers. We will continue sharing with you what we learn through ROSIE in the months ahead. In the meantime, we’re interested in hearing from those of you with experience in these matters: What have you tried and what’s worked best?
This project is supported by of the Global Partnership for Education Knowledge and Innovation Exchange (KIX), a joint partnership between the Global Partnership for Education (GPE) and the International Development Research Centre (IDRC). The views expressed herein do not necessarily represent those of GPE, IDRC or its Board of Governors.
Brookings is committed to quality, independence, and impact in all of its work. Activities supported by its donors reflect this commitment and the analysis and recommendations are solely determined by the scholar.