Reviewing the Evidence on Class Size

There is little doubt that reducing class size can boost student achievement in some circumstances. What is much less certain is how much of a difference class-size policies can make, and whether the impacts are large enough to justify the costs of hiring additional teachers and building new classrooms.

In a recent paper published by the Brookings Institution’s Brown Center on Education Policy, Russ Whitehurst and I review the research evidence on class-size reduction (CSR). We find that there are relatively few high-quality studies, and that these studies show mixed results.  We argue that any benefits of CSR need to be considered in the context of its substantial costs relative to the benefits that might be produced by alternative uses of the same resources.

In a review of our paper published by the National Education Policy Center, Diane Whitmore Schanzenbach of Northwestern University describes our review of the class-size literature as “misleading,” arguing that it “puts too much emphasis on studies that are of poor quality or that do not focus on settings that are particularly relevant to the debate on class-size policy in the United States.”

Applying Schanzenbach’s standard for high quality studies yields only three studies of class size in the United States that are worthy of examination.  Two of these, the Tennessee STAR experiment and a non-experimental evaluation of the Wisconsin SAGE program, indicate positive effects of smaller classes.  The third, a rigorous quasi-experimental study of schools in Connecticut, finds no benefits of smaller classes.

These three studies are a reasonable starting point for an academic discussion of whether smaller classes have the potential to raise achievement.  Clearly they do, at least in Tennessee in the 1980s and possibly in Wisconsin in the 1990s as well (but not in Connecticut).  But the most significant CSR policies under consideration are not small pilot programs like those in Tennessee and Wisconsin, but rather, statewide mandates imposed at costs upwards of billions of dollars.

Large-scale CSR policies require the recruitment of many new teachers.  Reductions in average teacher quality, as might be expected from hiring a large crop of inexperienced teachers, could offset any direct benefits of smaller classes.  An evaluation of California’s statewide CSR policy found that it increased the shares of teachers that were new and not fully certified.  And my evaluation of Florida’s CSR mandate found no evidence of positive effects on student outcomes.

The California and Florida evaluations certainly have significant limitations, but in my view they provide preliminary evidence that large-scale policies are unlikely to produce benefits as large as those found in Tennessee.  But applying Schanzenbach’s standard for studies leaves us with no studies of these kinds of large-scale policies.  It seems awfully hard to make a case for large-scale CSR policies if we know essentially nothing about their effectiveness.

Even more important than the effectiveness of a policy is its cost effectiveness.  As Russ Whitehurst and I argue in our paper, the right question to ask about any policy is not whether it has any effect at all, but whether it is the most effective use of limited resources.  Unfortunately, there is little rigorous evidence on the relative cost-effectiveness of various education policies.  There is a clear need for such evidence, but in the meantime it seems unwise for policymakers to mandate widespread adoption of a costly policy with uncertain benefits.

CSR may well be cost-effective in some circumstances, especially if it is implemented in a targeted way.  For example, a district may find it sensible to provide small classes for its most disadvantaged students or its newest teachers.  But CSR mandates take exactly the opposite approach in that they apply across-the-board and take away schools’ autonomy to decide whether reducing class size is the best use of limited resources.

The NEPC review makes several other claims with which I disagree.  I respond to them here:

  1. The review incorrectly asserts that Florida provided additional funding to all schools, when in fact all CSR funding was provided at district level (in roughly equal amounts per pupil to each district).  As a result, my district-level results indicate the effect of CSR as compared to equivalent additional resources, but my school-level results (which also indicate no effects of CSR) do not hold funding equal because districts were free to allocate their CSR funds however they saw fit.  Additionally, the review mentions that other policies were enacted in Florida during the CSR period.  However, these policies will not contaminate my results unless they differentially impacted the treated and comparison groups of districts/schools that form the basis for my analysis.
  2. The review argues that studies of class size in middle school are “less relevant in the current policy context that is focused on early grades,” when in fact current policy discussions about class size span include middle school grades.  For example, Florida’s CSR mandate applies to all grades, PK-12.
  3. The review raises a “practical problem with the cost-savings calculations,” which is that teachers are not easily divisible.  For example, a school with 100 students in a given grade cannot reduce class size by one student—it must choose between five classes of 20, four classes of 25, or three classes of 33-34.  However, this practical problem only applies to districts that are unable to manage school-level enrollment with class size in mind.  Additionally, it indicates the desirability of allowing schools and districts to manage enrollments in ways that maximize productivity.
  4. The review argues that the Brookings report “mischaracterizes the STAR findings as unusually large relative to other studies,” such as Angrist and Lavy’s study of schools in Israel.  Our contention that the Israel results are on the “lower end of the range of those found in the STAR study” is taken directly from Angrist and Lavy’s statement in their original paper that “our estimates of effect size for fifth graders are at the low end of the range of those found in the Tennessee experiment. The effect sizes based on estimates for fourth grade reading scores are only about half as large as those for fifth graders.”
  5. The review claims that the Brookings report “states that CSR is the ‘least cost-effective’ policy.”  We never make this claim ourselves.  We do report the results of the only study that carefully compares the cost effectiveness of different educational policies (including CSR), which finds CSR to be the least cost-effective policy of those studied.
  6. The review argues that the Brookings report “bases much of its argument on the impractical and questionable assumption that any reduction in the teacher workforce can be made on the basis of instructional quality instead of according to the terms of teachers’ current contracts.”  This claim ignores the vigorous debates over “last in first out” policies that are currently being debated in statehouses across the nation.  Clearly, the “terms of teachers’ contracts” are not set in stone.  The key point is that the effects of an increase in class size are likely to depend on how the increase is implemented.  Firing only the least experienced teachers will likely have more harmful effects than basing personnel decisions on performance measures.