As states and districts implement new teacher evaluation systems, they will struggle to differentiate between excellent and poor instruction, as well as to define a minimum standard of effectiveness. The task is complicated by the legacy of perfunctory evaluations in K-12 education, in which more than 98 percent of teachers were given the same “satisfactory” rating.
To avoid making “effective” the new “satisfactory”, here is an alternative standard to consider: after a probationary period, a teacher is “effective” if and only if, based on the available evidence (such as from classroom observations, students surveys and student achievement gains), their predicted impact on students exceeds that of the average novice teacher. In other words, if, after a few years in the classroom, a teacher’s predicted effectiveness is below that of the average novice teacher in their grade level and subject, then he or she would fail to meet the minimum standard of effectiveness required for tenure.
Such a definition has two advantages: First, it makes explicit the decision a principal implicitly makes every time he or she retains a non-probationary teacher—to forego the opportunity to recruit a novice teacher as a replacement. Would an NFL coach give up a future draft pick for an experienced player he expects to perform worse than the average rookie? Not if he were trying to win. Would a principal promote or retain a teacher with expected performance below that of the average novice? Not if he or she had the students interests at heart.
Second, it is a self-correcting standard. If principals were to label an unreasonably large proportion of teachers as “exemplary”, then the observation score required to achieve “effective” status would increase. Moreover, if the pipeline of teachers were to dry up, and the quality of new recruits were to decline, then the standard for “effectiveness” would adjust downward. Likewise, if the quality of teacher preparation programs were to increase, then the standard for tenured teachers would be raised.
In their rookie year of teaching, most teachers struggle. Most teachers improve from their first to their second to their third year of teaching. However, a substantial share of teachers will underperform the average rookie in their third year of teaching. For instance, in the typical school district, the average classroom of students assigned to a rookie teacher loses .05 to .10 standard deviations in student achievement relative to students with similar starting points by the end of the year. Using data from several school districts, about 15 to 30 percent of third year teachers would have predicted effectiveness below that of the average novice.
When a teacher fails to demonstrate effectiveness, a principal should retain the ability to offer tenure. But tenure should no longer be the default outcome in such cases. A principal should be willing take additional steps, such as submitting a plan for supporting a teacher’s development or informing the parents in the school.
Two additional safeguards would contribute to the fidelity of the evaluation system:
First, observers should not only be trained, they should be asked to demonstrate their ability to apply the standards on a set of sample videos (which have been pre-scored by master observers.) For instance, an observer in any district using Charlotte Danielson’s Framework for Teaching should demonstrate their ability to recognize what unsatisfactory, basic, proficient and advanced questioning strategies looks like. However, while necessary, such training is unlikely to be a sufficient to prevent score inflation. In the Measures of Effective Teaching project, we learned that even when principals could apply the standards when observing teachers from outside their schools, they scored their own teachers higher than other observers did—granting a significant “home field advantage”.
Second, states and districts should compare the distribution of ratings in different schools or districts and study the correlation between observation ratings and other measures such as student growth measures or student surveys. We have learned a lot in recent years about what those relationships should look like. Outliers should be investigated. However, if every principal were inflating their scores similarly, it would be impossible to recognize.
The logic behind the “better than the average novice” threshold reflects the decision that principals implicitly make when they retain a teacher and forego hiring a new teacher. Moreover, the standard is responsive to factors such as score inflation and the supply of teachers. A teacher should not be labeled “effective” unless he or she is outperforming the average novice teacher.