Improving Accountability in the Elementary and Secondary Education Act

Debates on reauthorizing the Elementary and Secondary Education Act (ESEA) have focused on its requirement that districts give annual reading and math tests to students in third through eighth grades. On their own, tests just provide scores, but under No Child Left Behind (NCLB), a lack of growth in scores has consequences. Those consequences have fueled the debates about testing.

The NCLB scheme of consequences was designed fifteen years ago. Schools (or whole districts) that were failing to improve test scores had to, first create an improvement plan, then offer parents the choice for their child to attend a better-performing school, then offer tutoring after school for students who were not performing well (along with offering choice), and, lastly, restructure schools that failed to improve five years in a row (along with offering choice and supplemental services).

Two aspects of this scheme stand out. One is that only the first (planning for improvement) and last (restructuring) are about the school. A school that offers parental choice and supplemental services can continue to do whatever it was doing in its classrooms: same teachers, same principal, same materials. Having parents move their children to other schools might induce schools to try to improve to be more competitive and attractive to parents, but a school could ignore these forces. And supplemental services are delivered during after-school hours, usually by third-party organizations, and schools could ignore those too.

Research from the last fifteen years offers evidence about NCLB and the ways its consequences might be reconfigured to be more impactful. Hanushek and Raymond (2005) showed that having consequences improved test scores. Before NCLB, some states had accountability schemes with consequences and others had accountability schemes without consequences (the states only posted score results without tying them to penalties or rewards). Hanushek and Raymond showed that test scores on the National Assessment of Educational Progress (NAEP) rose in states that had consequential accountability but not in states that had what they called ‘report card’ accountability. Dee and Jacob (2010) compared NAEP scores in states that changed their accountability schemes to meet the requirements of NCLB—which in practice meant making consequences stricter—and showed that test scores rose more in these states than in states that did not have to change their consequences (because they already fit with NCLB’s requirements).

Together, these findings are evidence that consequences matter. Reporting test scores may be useful for letting parents and communities know which schools are doing well and which are not. But making scores count leads to improvements.

More recently, Ahn and Vigdor (2014) provided insight into what it was about the scheme of consequences that improved scores. Using the long history of score data in North Carolina, they showed that schools that reached the first consequence—the schools failed to make adequate yearly progress and had to file an improvement plan—improved more than similar schools that just barely made adequate yearly progress and did not have to file an improvement plan. That this mild first consequence of having to plan for improvement meant schools improved is analogous to a parent saying “don’t make me come up there” to their child and having them behave better. Ahn and Vigdor also showed that the restructuring consequence improved test scores. Offering parents choice and offering supplemental services did not improve scores. Other studies are consistent with these findings.

The evidence points to having consequences being effective—they lead to improved scores—and consequences being more effective when they change what schools do, either mildly (having to plan for improvement), or dramatically (having to restructure). How can the middle consequences be made more impactful? The answer may be to focus on curriculum and instruction, which means teachers and principals.

NLCB’s requirement to administer annual tests in grades three through eight created a mountain of new data. And several states, including Texas, Florida, and North Carolina, had begun amassing score data before NCLB. Findings from research using these new data show that teachers differ hugely in their ability to generate score gains. Goldhaber (2015) summarized this research and noted that in upper elementary grades (under NCLB, required tests begin in third grade), having a lower-performing teacher (one at the 30^th percentile of teachers) is roughly equivalent to a student learning half as much in the school year compared to having a higher performing teacher (one at the 70^th percentile of teachers). These differences have been measured only for reading and math, but these are core subjects, and there’s little reason to believe the magnitudes would differ for science and social studies.

The same line of research also found that it is hard to predict which teachers will be high performers. The best predictors of teacher effectiveness are how a teacher has already performed, and how long he has been teaching. Generally, high performers stay high performers. A massive study of teaching funded by the Gates Foundation had a related finding. On typical measures that schools use to observe teachers in classrooms, high performers had high ratings on all dimensions.

The fix, then, for schools performing poorly is straightforward but not practical: gauge effectiveness for all teachers in a district, and move high performers to low-performing schools. The Institute of Education Sciences tested something like this approach on a small scale. As part of its study, high-performing teachers were offered financial incentives to move to low-performing schools. Only one or two teachers were moved to any one school. The study found that high performers resulted in an improvement of an entire grade level’s test scores. If the high performer were a fifth grade teacher, for example, the entire fifth grade improved its test scores from fourth to fifth grade. The high performer’s class generally improved the most, but that improvement was so large it was enough to move the whole grade level up.

This fix is about as low-risk as one can get to improve performance of a whole school, like ensuring the U.S. wins an Olympic gold medal in basketball by putting ten NBA all-stars on its team. It’s hard to imagine doing this fix on a large scale, however. A practical though possibly less effective approach would be for low-performing schools to increase skills of their teachers. Upskilling quickly means bringing in skilled teachers as overseers or mentors, possibly transferring weak teachers out of schools and bringing in high performers, as noted already, or providing materials or technologies that improve teacher skills directly or indirectly. This is not “teacher professional development” as it’s usually understood. But a school facing consequences right now has little time for its teachers to attend classes, in-service workshops, or summer institutes. A manufacturing company facing bankruptcy because it is producing defective products does not send its employees to the local community college to take courses. It locates the cause of the defects and fixes them as soon as it can.

Suppose a school continues to perform poorly despite upskilling its teachers. What next? The focus would turn to the principal. (These approaches could also happen at the same time.) Another finding emerging from recent research is that, like teachers, principals differ widely in their effectiveness. Principals of low-performing schools can be assigned a mentor or coach, given added support, or replaced by a known effective principal.

If the school’s performance problem continues, after upskilling its teachers and principal, the last consequence—perhaps admitting defeat—could be to give parents the option to take their school funding or part of it to another school. It could work something like this: any school meeting standards that accepts a student transferring from a designated (low-performing) school receives a bonus in addition to standard per-student funding. These funds can be used to supplement teacher pay through bonus or incentive schemes, improve technology, or upgrade instructional materials. Why pay a bonus? Because better schools may not want more students. More students mean larger class sizes, a more crowded facility, and added clerical and logistical responsibilities.

Whether the federal government should mandate an accountability structure is a different question from whether it should mandate that there be accountability. And the informational complexity of improving teacher and principal skills school by school is an example of why a mandated structure may be too blunt an instrument. Even two neighboring schools in the same district can have different skill needs related to different hiring patterns, experience levels, teaching philosophies, technology infrastructure, and so on. If states design their own accountability structures, and the federal government asks that the structures indicate how they will improve teacher and principal skills, objectives are being identified at a high level and approaches for meeting them are being identified at a local level.

Evidence on the negligible effects of choice and supplemental services also needs to be viewed in perspective. Showing that parent choice and supplemental services under NCLB had little effect is not the same as showing that parent choice and supplemental services cannot be effective. The quality of implementation was weak, and averages can conceal positive outcomes for some. If states and districts had positive outcomes with either one, including these consequences in an accountability structure makes sense. As part of a state’s proposal for the scheme, the federal government could request evidence of why they are viewed as successful. This same approach of asking for evidence was used under NCLB to determine whether state assessments were adequate.

Improving teaching and school leadership will cost money. In the current draft of the reauthorized ESEA, Title II, “High-Quality Teachers, Principals, and Other School Leaders,” includes annual funding of about $3 billion. One of the title’s stated purposes is “increasing the number of teachers, principals, and other school leaders who are effective in improving student academic achievement in schools.” Focusing those resources on schools that need the most help could easily be done.

Improving Accountability in the Elementary and Secondary Education Act

Subscribe to the Brown Center on Education Policy Newsletter

Improving Accountability in the Elementary and Secondary Education Act

Mark Dynarski Mark Dynarski Owner - Pemberton Research, Former Brookings Expert

Mark Dynarski