Why is accountability always about teachers?

School board meeting with students and parents

Most education reform efforts focus on what teachers are doing — professional development, new curricula, bonuses and incentives to raise scores, and so on.  All are based on the belief that teachers can teach more effectively if their skills can be improved, their tools can be better, and their efforts can be more energetic.

Teachers are the largest group of staff within the K-12 system, and their skills matter for its performance. But they do not manage or direct the system. Do organizations wanting to improve expect that they can get it done by upskilling only their line-level staff? If Walmart were losing money, would it conclude that management was doing a great job but the floor staff needed professional development? The more natural focus would be on decisions and actions of executives, managers, and senior administrators.

An average teacher is highly experienced

The du jour focus in education reform (currently personalized learning, differentiation, and hybrid learning are topical) typically presumes teachers have an appetite and willingness to change their classroom practices. But teachers are both highly experienced and work in highly constrained settings.

An average K-12 teacher has been teaching for about 14 years. A typical school year is 180 days, a typical school day is 6.5 hours—so average teachers have taught more than 16,000 hours. During those hours they have worked with hundreds of children. If they teach in middle schools or high schools, it may be thousands of children. From those many hours, teachers have amassed pedagogical practices they believe work for their students. These practices may be effective or flawed or plain wrong, but the point is that teachers might not be easily separated from their practices.

And these teachers face a lot of constraints in classrooms. Teachers are assigned to grade levels, their students are assigned to classrooms, their textbooks and supplies, including software and computers, are chosen for them, and the entire school or district is lockstep in a schedule that dictates how much time is spent on each subject. Teachers control how much time they invest outside the classroom in exploring new teaching approaches or learning about what others are doing that might work for them too. But any ideas they find in this kind of self-study still need to fit within the constraints. A teacher who reads about an interesting approach for, say, teaching fractions, has to contend with a textbook and test materials that might focus on a different approach to teaching fractions.

Evidence is lacking on how teachers can be more effective

A group as large as teachers (there are about 3.1 million public school teachers) will include some who are more effective and some who are less effective, and ample evidence exists that teachers differ in their effectiveness. With the exception of how many years a teacher has taught, however, what separates highly effective teachers from less effective teachers has proven to be a tough nut to crack, and, relatedly, far less evidence exists about how to move teachers from the lower side of the effectiveness curve to the higher side.

The New Teacher Project (TNTP) recently looked at professional development in large school districts and a charter school network and concluded that “We found no evidence that any particular kind or amount of professional development consistently helps teachers improve.” It’s not for lack of spending to help teachers improve—TNTP estimated that large districts were spending about $18,000 more a year per teacher on professional development. TNTP also reviewed the broader research literature and commented on findings from the most rigorous studies that had been done by the Institute of Education Sciences: “teachers who received the best of the best [professional development] were no more likely to see large, lasting improvements in their practice, knowledge, or student learning. In fact, many did not use the techniques they’d been trained to employ—even when researchers were in the room to observe them.”  This last point may relate to teacher experience noted above—a teacher who has been teaching a subject for years might not be easily convinced to teach it some other way based on a presentation at a workshop.

These ‘top down’ approaches to improve teaching have been complemented by ‘bottom up’ approaches that offer financial incentives for teachers to improve. The idea of financial incentives is based on logic that economists find eminently sensible—workers work harder when money is at stake, so giving teachers higher pay for higher test scores should cause test scores to go up.

An attractive feature of financial incentives is that teachers can plot their own paths to improvement. This is the ‘bottom up’ aspect. It’s an idea worth testing, and two recent studies have. Both were large and designed to the highest research standards. They are worth discussing at some length because both studies reveal insights about teachers and districts that add to the picture of how accountability might be better focused.

The first study was of incentive pay (bonuses) for middle-school math teachers in the Nashville school district. The largest bonus was substantial, $15,000 a year, for teachers whose performance was in the top five percent of teachers based on historical district data. Currently, the district’s salary for a teacher with 14 years of experience (the US average) and a master’s degree is $56,000, so the bonus was about 25 percent of annual salary. Amounts of $5,000 and $10,000 were paid for teachers at the top 20 percent and top 10 percent. The constraints on teachers mentioned above were not relaxed by the incentive-pay program—teachers still were given their grade levels, their students, and their curricula.

But test scores did not improve. And two other interesting findings emerged suggesting why scores did not improve. One was that teachers reported on surveys that they did not do anything different in response to potential bonuses because they already were working as effectively as they could. A second was that teachers did not believe that a teacher who earned a bonus was a better teacher, or that teachers who did not earn bonuses needed to improve. It’s hard to expect bonuses to do much if teachers believed they already were redlined and did not agree with the logic of bonuses.

A second study measured effects of incentive pay (the federal ‘Teacher Incentive Fund’) in 10 districts and reported similar results. Test scores barely moved (they improved by an amount roughly equivalent to one to two-tenths of an IQ point). The study also reported that districts did a terrible job explaining bonuses to their own teachers. In the fourth year of the program, forty percent of teachers who were eligible for bonuses did not know they were eligible. Eligibility was by school, but even teachers in the same school differed on whether they thought they were eligible. And when asked to predict how much their bonus for increasing scores would be, their answers were far smaller than what the real program was going to pay them. Teachers reported they were eligible for a maximum bonus of about $3,000. Districts reported paying maximum bonuses averaging about $9,000.

One of the program’s requirements was that districts create systems for awarding bonuses that differentiated between teachers—the whole idea of bonuses is to reward above-average performance. Yet seventy percent of teachers ultimately received bonuses. The bonuses averaged $2,000, about 4-5 percent of average teacher salaries. Knowingly or unknowingly, districts essentially converted their bonus programs into teacher raises.

Accountability needs to be more equitable

The findings suggest top-down and bottom-up approaches to improve teaching are unlikely to do much. Yet the last ten years have seen tremendous growth in teacher and principal evaluation systems that rely on test scores and observations to rate teachers. If sending teachers to professional-development workshops or paying them real money to improve does not yield results, it’s at best unclear why expending significant amounts to measure and observe their performance will yield results.

The systems focus their measurement and analytic machinery on teachers, who have the least ability to improve what they do. Senior leaders make decisions that affect every aspect of life for teachers in schools. Senior leaders hire teachers, using criteria they’ve chosen. They give tenure to teachers using criteria they’ve chosen or agreed to. Senior leaders assign teachers to grade levels, give them textbooks and curricula, buy and set up their technology, lay out their schedules, create disciplinary policies they need to follow, and choose programs for how they will work with students learning English, and students with disabilities, and students with reading difficulties, and students who are homeless. And senior leaders decide to change these –they adopt new curricula, set up new testing programs, roll out new technology, change schedules for subjects, modify discipline policies.

Teachers are not making these decisions. They might be asked for input on the decisions, but they do not make them.

A teacher does not declare that next year the school will be using this curriculum as their math series.

An important qualification is that some systems, such as the DC IMPACT system, provide a basis for firing ineffective teachers and rewarding highly effective teachers. Eric Hanushek has written elsewhere about the high costs associated with ineffective teachers. To date, these systems have reported large numbers of effective teachers, and, previously, I noted it is unlikely that 98 percent of teachers really are effective, if the word ha15s any meaning. But being able to identify the lowest-performing teachers at least provides administrators with a basis for removing them.

Accountability for administrators is complicated when organizations are not for profit. Private-sector organizations have profit as natural metric, and the market does the work of measuring it.  School districts do not have a measure of profit to gauge their success. They need to decide which ‘interventions’ or processes to test, which outcomes to focus on, how outcomes will be measured, and who is responsible for them. For example, Whitehurst previously has written about the promise of selecting more effective textbooks and curricula. Selecting a new math series, for example, should begin an evaluation cycle: Decide on outcomes, how they will be measured, and how much they should be expected to increase. Then assess outcomes and learn whether the series worked. If it seems hampered by implementation factors, adjust them and assess again. If outcomes improve, the improvement will be experienced both by teachers and by administrators who decided on trying the new math series. Equity in accountability is just as desirable in schools as it is in private-sector organizations.

Finding what works to improve involves risk—ideas might work out or they might not. Under the current system, administrators create the structures and administrators come up with the ideas about what might work. Teachers are then assessed on the results. We need to think about how to shift risks back to where they belong, which is with those who make the decisions.

The author did not receive any financial support from any firm or person for this article or from any firm or person with a financial or political interest in this article. He is currently not an officer, director, or board member of any organization with an interest in this article.


  • Footnotes
    2. A recent study by the Institute of Education Sciences and Mathematica Policy Research reported that having a teacher at the 10th percentile of effectiveness compared to having a teacher at the 90th percentile of effectiveness is roughly equivalent to a student achieving 15 percentile points higher on a reading test and 19 percentile points higher on a math test. Differences of this size are rare in education research.
    7. See table IV.9 on page 65.
    8. Figure IV.11, page 67, ibid.