Brown Center scholars and guests have been posting here since January on the Obama administration’s proposal for a $75 billion federal investment (with an equal state match) to increase enrollment in state pre-K programs. The previous posts question the large impacts on student achievement from certain taxpayer-funded pre-K programs touted by advocates of the Obama proposal.
The most credible recent study of pre-K outcomes, the federal Head Start Impact Study, found only small differences at the end of the Head Start year between the performance of children randomly assigned to Head Start vs. the control group, e.g., about a month’s superiority in vocabulary for the Head Start group. There were virtually no differences between Head Start and the control group once the children were in elementary school. Two other recent studies of pre-K programs by the federal government using rigorous random assignment designs (see here and here) produced similar findings, i.e., impacts on school readiness ranging from null to small. A piece here last month examined the association across the 50 states between the number of children enrolled in state pre-K programs and later academic performance. The association was positive but modest, suggesting that 4th grade achievement in reading and math across the nation would increase by no more than about a 10th of a standard deviation if state pre-K enrollments increased dramatically.
Advocates of Obama’s pre-K proposal and of universal pre-K in general have appealed to other research, which they contend makes the case that high quality pre-K can produce sizable and lasting gains for children. Some of that research is of demonstration programs from a half century ago (e.g., Perry Preschool) that are so different in many important ways from current state pre-K programs that findings on the impact of those programs can’t be confidently generalized to the present day. Other more recent research cited by universal pre-K advocates uses designs that have obvious limitations in terms of drawing causal conclusions. Most important, the groups being compared have not been proven equivalent prior to the pre-K experience, as would be the case in a true randomized study.
Most of the recent research studies used to support the president’s pre-K proposal utilize a particular form of a research design called age-cutoff regression discontinuity. Here’s how the studies work: If a state or other jurisdiction requires that children be 4-years-old by October 1 in order to enter the pre-K program, then children who just made the deadline by having been born in late September four years earlier can attend the pre-K program whereas those born in early October have to wait a year. Researchers take advantage of this arbitrary age cutoff by administering tests of academic skills to the children who are just entering kindergarten, having completed the pre-K program the previous year (the treatment group), and to the children who are just entering that same pre-K program (the control group). The researchers then compare test scores for the beginning kindergarten treatment group to the beginning pre-K control group. Since test scores depend on age, they adjust statistically for the one-year difference in the average chronological age of the two groups, and they conclude that any test score difference between the two groups represents the causal impact of the pre-K program. In other words, they assume that the only systematic difference between the two groups of children other than age, for which they control statistically, is that the older group has attended the state pre-K program for 4-year olds and the younger group has not. Accordingly any differences in test scores must be due to the state pre-K program.
There are several reasons that this particular version of a regression discontinuity design, though not other applications, is problematic in terms of providing causal evidence on the impact that increasing participation levels in state pre-K programs will have on student achievement:
Age-cutoff regression discontinuity addresses the wrong policy question.
The relevant policy question for an expansion of public pre-K is the degree to which the availability of the public program changes children’s outcomes compared to business-as-usual for those same children. From the perspective of a state legislator: If we invest in establishing or expanding the state public pre-K program will the children who attend do better in elementary school than they would have if we did nothing? An appropriately designed randomized trial can provide the best answer to this question. For example, if the rollout of the new program were done gradually so that the initial supply of publicly funded pre-K slots was less than the demand then a lottery could be used to give all interested parents a fair shot at gaining enrollment for their four-year-old. A comparison of the school readiness upon entering kindergarten of the 4-year-olds who won vs. lost the pre-K admissions lottery would provide an unbiased estimate of the impact of the state pre-K program in the year being studied.
Note that parents losing the lottery would still be able to enroll their children in other pre-K programs or in the same pre-K programs if they were willing to pay the tuition themselves. In fact, low-income parents of 4-year-olds frequently manage to place their children in a pre-K program if a public program is unavailable. For instance, about half of the 4-year-olds in the control condition in the Head Start Impact Study were enrolled in early childhood education programs. More affluent families enroll their children in pre-K programs at even higher rates than low-income families. Thus business-as-usual in terms of pre-K access in a state that has no public pre-K program at all is likely to be substantial for 4-year-olds. This is the comparison condition that is relevant to the state legislator’s policy question.
But the control group for the regression discontinuity design consists not of 4-year-olds as in the randomized trial but of children who by virtue of the state’s age cutoff for school entry are categorized as 3-year-olds in the year before they are tested at entry into pre-K. As 3-year-olds we can expect them to be enrolled in center-based programs at lower rates than the 4-year-olds with which they are being compared. Further, even when the control group children are in organized child care settings in the year before they are tested, they will be in classrooms with 3-year-olds and likely receiving a curriculum that is less focused on school readiness than would be the case if they were categorical 4-year-olds. Thus, for example, whereas 55% of the children in the control group in the randomized trial might have been attending preschools and experiencing a school readiness curriculum, only 40% of children in the control group in the age-cutoff design might have been in center-based programs, with much of those programs having a daycare rather than educational focus. This would cause the regression discontinuity design to produce larger estimates of effects than would be found in a randomized trial.
Differential attrition in age-cutoff regression discontinuity studies could produce substantial bias in the results.
The ability to make strong causal inferences in any research that compares groups of people who vary in their exposure to a program or intervention depends on the groups not differing systematically on anything except program participation. One substantial threat to such equivalence in the age-cutoff design is differential attrition (participant drop out) between the treatment and control conditions.
The children in the treatment group in the regression discontinuity design are tested when they enter kindergarten after having completed the pre-K program that is being evaluated. In contrast, the control group of children is tested as they enter pre-K. The treatment group excludes children who dropped out of the pre-K program for various reasons, including moving away or having trouble of some kind in the program. These children are likely to have lower test scores than the children who finish the program (e.g., mobility is a notorious predictor of the most disadvantaged families). In contrast, control group children who are just starting pre-K have not yet experienced the conditions leading to drop out. Thus all of the children who are in circumstances that will eventually lead them to drop out or move during the pre-K year are in the control group whereas none of these children are in the treatment group. Mobility is a significant risk factor for children’s development and school success, and it occurs at very high levels in low-income populations. What looks like a pre-K program effect in the regression discontinuity design may in some or large part be a reflection of differences between the treatment and control groups in family circumstances or other issues that lead children to drop out of a pre-K program. Were a randomized trial being conducted, differential attrition could be measured, its potential threat estimated, and it could be adjusted for in some circumstances. But since the age-cutoff design as implemented to date has no pre-test scores, there is no way to take differential attrition into account.
Age-cutoff regression discontinuity designs produce implausibly large estimates of effects.
Perhaps because of the reasons we have described, many of the studies that utilize the age-cutoff design produce differences favoring the pre-K participants that are very large compared to the effects that have been demonstrated in studies of contemporary pre-K programs using random assignment. For example, a study of the Tulsa, OK universal pre-K program using the age-cutoff regression discontinuity design found that it improved pre-reading skills compared to the control group by nine months (effect size = 0.99). An evaluation of the Abbott districts pre-K program in New Jersey found that it improved children’s vocabulary compared to the control group by 4 months (effect size = 0.28). A regression discontinuity study of the Boston pre-K program found vocabulary improvement of 6 months (effect size = 0.44).
These are large effects, especially for the Tulsa and Boston studies. To place these values in context, the improvement in vocabulary that results from attending Head Start as a 4-year-old is estimated to be about one month (effect size = .09) at the end of the Head Start year. The Institute of Education Sciences within the U.S. Department of Education conducted a multisite evaluation of 14 preschool curricula using random assignment. The average impact on vocabulary was virtually the same as the estimate from the National Head Start evaluation: about one month of improvement for the children in the treatment conditions compared to the control conditions (effect size = 0.10). A randomized trial of the National Even Start program found no statistically significant difference between participants and non-participants on measures of pre-academic and cognitive skills.
When well designed and implemented third-party randomized trials of good preschool programs generate estimates of effects that are orders of magnitude smaller than those being generated using the age-cutoff regression discontinuity design, it raises questions that should motivate a careful examination of the methodology of the age-cutoff research.
In conclusion, advocates of universal pre-K and Obama’s proposal for Preschool for All ground their policy preferences in findings from a selected group of research studies that are problematic. Several recent studies using an age-cutoff regression discontinuity design carry considerable weight in the policy debate because they seem to show large effects on school readiness of present day public pre-K programs. The limitations to these regression discontinuity studies raised here suggest the need for prudence in rolling out a new, expensive, federally supported pre-K program. Because "gold standard" randomized studies fail to show major impacts of present day pre-K programs, there are reasons to doubt that we yet know how to design and deliver a government funded pre-K program that produces sufficiently large benefits to justify prioritizing pre-K over other investments in education. It would be far better, in our view, for the federal government to proceed with carefully planned and rigorously evaluated demonstration projects involving interested states than to go all in on a new public benefit that will be difficult to pull back or adjust once generally available.
Russ Whitehurst is the director of the Brown Center for Education Policy and a senior fellow in Governance Studies at the Brookings Institution
David J. Armor is Professor Emeritus of Public Policy in the School of Public Policy at George Mason University. He has held faculty positions at Harvard and UCLA, was senior social scientist at the Rand Corporation, an elected member of the Los Angeles Board of Education, and U.S. Acting Assistant Secretary of Defense for Force Management and Personnel. He has conducted research and written widely in the general area of social policy, with special emphasis on education, civil rights and military manpower issues.