The next phase of teacher evaluation reform: It’s up to you, New York, New York!

Since 2009, more than 40 states have rewritten their teacher evaluation policies. Given that school systems have neglected to manage classroom instruction for decades, it was inevitable that many schools would struggle to implement them. New York Governor Andrew Cuomo re-ignited the controversy by including a second round of teacher evaluation reforms in his budget this year.  Below, I describe the most promising opportunities in the new law.  Hopefully, New York will provide a blueprint for other states as they tweak their own systems in the coming years.

A Higher Standard for Tenure

Traditionally, principals have used much too low a standard when granting tenure, viewing the probationary period merely as an opportunity to weed out the worst malpractice.  Under the new law in New York, the length of the probationary period will be lengthened from three to four years and no teacher rated “ineffective” in their fourth year would be able to earn tenure.

Therefore, much depends on what it means to be designated “ineffective.” As New York learned last year when 96 percent of teachers were rated “effective” or “highly effective”, a vague standard is equivalent to no standard.  The department should specify that a probationary teacher is “ineffective” during their fourth year of teaching if:  (i) a teacher’s average student achievement gain during their second through fourth year of teaching falls below that of the average first-year teacher in their district or (ii) the classroom observations done by external observers during their second through fourth year of teaching falls below that of the average first-year teacher.[1]  

Most teachers improve their practice during their initial years of teaching.  However, if, by their fourth year of teaching, a probationary teacher has not moved beyond the performance of the average novice in their district in terms of student achievement growth and measured classroom practice, students would be better off on average if the district were to commit to fill that teacher’s assignment with a novice teacher every year instead.  A fourth year probationary teacher who has been no more effective than a novice teacher should not receive the long-term commitment which accompanies tenure. 

Such a standard would have a number of advantages:  First, it reminds principals that a promotion decision involves a choice (albeit usually implicit) between two teachers—the probationary teacher and an anonymous novice.  Would an NFL coach forego 25 years of future draft picks in order to sign a mediocre player to a long-term contract?  No. Yet principals in New York and elsewhere have done so every spring.   Linking the standard for tenure to the effectiveness of the average first-year teacher would remind everyone of the opportunity cost involved in every tenure decision.

Second, it would be a self-adjusting standard: if classroom observation scores become inflated or if the quality of those willing to enter teaching were to decline (or rise), the threshold for tenure would adjust accordingly.

Third, by relying on the scores given by external observers, the tenure decision would no longer be at the sole discretion of the local principal.  Because a tenure decision involves thousands of future students as well as future colleagues and supervisors at other schools in a district where a teacher might work, it makes no sense to leave the decision in the hands of their current supervisor alone.

If tenure protections were reserved only for accomplished teachers, just imagine how different our schools would be.

Allow Tenured Teachers to Develop a Longer-Term Track Record

Rather than focus solely on a teacher’s performance during the most recent academic year, the teacher evaluation system should allow tenured teachers to accumulate a longer-term track record of excellence.[2] 

After the tenure decision, a teacher’s evaluation each year should depend on four parts: 40 percent of the weight should be placed on student achievement gains in all available prior school years, 40 percent should be placed on prior classroom observations and the remaining 20 percent should be split between their student achievement gains and classroom observations in the most recent year. 

As in many professions (including higher education), a past history of success signals that a teacher has the talent and accumulated skill to be successful in the future.  The only reason to place greater than proportional weight on the most recent performance is to preserve teachers’ incentive to maintain effort, and not simply to rest on their laurels.  Only in professions such as sales, where it is more important to incentivize current effort than to retain talent, is it necessary to ask, “What have you done for us lately?” Therefore, it does not make sense to limit evaluations to the current (or most recent) year.

Aside from recognizing the importance of talent and accumulated skill, another advantage of a longer term perspective is that it frees up teachers with a strong track record to separate their own interests from those of their weakest colleagues.  Reform advocates mistakenly believe that the vast majority of teachers have nothing to fear from efforts to root out “grossly ineffective” teachers.  They say, “Only the weakest one or two percent of teachers have anything to fear from new teacher evaluations.”  However, they forget that the absence of any meaningful differentiation in the past has meant that many teachers do not know where they stand.  When a majority of teachers think they could be in the bottom two percent under an unfamiliar and unspecified system, they will resist change.  However, as teachers develop a track record and become less vulnerable to a single bad year, they will be more supportive of efforts to police their own ranks. 

Use Technology to Reinvent the Classroom Observation  

Children will not succeed until all teachers — both tenured and untenured — adjust what and how they teach.  Therefore, a successful teacher evaluation system must also support adult behavior change, and we must not underestimate how difficult that will be.

No one would launch a Weight Watchers club without any bathroom scales or mirrors. Student achievement gains are the bathroom scale, but classroom observations must be the mirror.

Under the new law in New York, one of a teacher’s observers must be drawn from outside a teacher’s school — someone with no personal axe to grind, whose only role is to comment on teaching.  A few other districts—such as Washington, DC and Hillsborough County Florida—have been incorporating outside observers in recent years. However, New York is the first state to require outside observers.

No school community can change the way they teach without starting an honest conversation about their own instruction. When 96% of teachers are rated effective or better despite high student failure rates, it is a sure sign that principals have not been honest. An external perspective will make it easier for longtime colleagues to have a frank conversation about each other’s instruction.

Yet, as valuable as they might be, external observations will also present significant logistical challenges. A lot of time could be wasted as observers drive from school to school. One alternative would be to allow teachers to submit videos to an external observer in lieu of in-person classroom observations.  (For similar practical reasons, the National Board for Professional Teaching Standards has been allowing teachers to submit videos for more than 20 years.)

Doing so would have a number of advantages. For instance, teachers usually struggle because of the clues they are not noticing, or because they lose track of time. It is difficult for such teachers to recognize their mistakes by reading an observer’s written notes after class. In fact, it’s biologically impossible for someone to recall cues they did not notice in the moment.

Giving teachers control of a camera, the opportunity to watch themselves teach, and allowing them to discuss their videos talk with external observers, peers and supervisors will provide be a more effective mirror than any observer’s written notes.

There would be other advantages as well. Harried principals could do their observations during quieter times of the day or week. And when principals do not have sufficient content expertise, they could solicit the views of content experts.

Finally, video evidence would level the playing field if a teacher ever has to defend their teaching against a principal’s written notes at a dismissal hearing—a teacher’s video vs. an observer’s written notes. Video is now widely used to coach improvements in activities such as athletics and dance and public speaking.   The state department of education should encourage districts to use technology to meet the external observer requirement.


New York has not been the only state to struggle with the implementation of teacher evaluation systems.  Many systems are still failing to set a high standard for teaching. Despite the controversy, let’s hope that Andrew Cuomo is not the only governor with the courage to revisit the issue.   Students will not achieve at higher levels until teachers teach at higher levels—and that’s simply not going to happen without quality feedback and evaluation.


Thomas J. Kane and Douglas O. Staiger “Improving School Accountability Systems” Working Paper, May 2002

Thomas J. Kane and Douglas O. Staiger “The Promise and Pitfalls of Using Imprecise School Accountability Measures” Journal of Economic Perspectives (Fall, 2002b), Vol. 16, No. 4, pp. 91-114.

Gibbons, Robert and Kevin Murphy “Optimal Incentive Contracts in the Presence of Career Concerns: Theory and Evidence” Journal of Political Economy Vol. 100, No. 3 (Jun 1992): 468-505.

[1] If the student growth data from the fourth year are not available in time (given the 60 day notification required in a tenure denial), then the average from their second and third year of teaching should be used.    

[2] Doug Staiger and I discuss the idea of basing school effectiveness ratings on a combination of long-term and short term track records in Kane and Staiger (2002a) and Kane and Staiger (2002b).   We drew upon earlier work by Gibbons and Murphy (1992) related to CEO compensation.