Using Test Scores in Teacher Evaluations is Tricky

Originally published in Education Week.

New York City Schools Chancellor Joel I. Klein wants to rate teachers in the nation’s largest school system on the basis of their students’ test scores. It’s a radical idea in public education (where teachers’ credentials have always mattered more than their performance), and the stakes are high: The nation spends $400 billion a year on public school teachers’ salaries and benefits.

Klein and his deputy, Christopher Cerf, see a clear logic in giving student test scores a role in teacher evaluations: It’s inexpensive and easy to administer and seemingly measures what matters most—student achievement. The vast majority of public school teachers are paid strictly on the basis of their seniority and the number of college credits they’ve racked up, rather than for their performance in the classroom, and Klein and Cerf want to change that, and rightly so.

But standardized-test scores aren’t the simple solution they seem to be. For one thing, only about half of public school teachers teach the subjects, or at the grade levels where students are tested, eliminating the prospect of a system that’s applied fairly to all teachers.

A second problem is that most standardized tests in use today measure only a narrow band of mostly low-level skills such as recalling or restating facts, rather than the ability to analyze information and other advanced skills. As a result, the tests tend to privilege low-level pedagogy, leaving the best teachers, those with wider teaching repertoires and the ability to move students beyond the basics, at a disadvantage, while putting pressure on the entire school system to focus on low-level skills.

To get a fuller and fairer sense of performance, evaluations should focus on teachers’ instruction—the way they plan, teach, test, manage, and motivate.

And then there’s the daunting challenge of separating out individual teachers’ impact on their students’ reading and math scores from the myriad other influences on student achievement, and the difficulty of drawing the right conclusions about teacher performance from very small numbers of student test scores, a particular challenge in elementary schools, where teachers work with a single classroom’s worth of students most of the day.

For these reasons, test scores should play a supporting rather than a lead role in teacher evaluations, and school systems should use schoolwide scores in their evaluation calculations, rather than individual teachers’ scores, a strategy that would also encourage staff members to collaborate rather than compete.

What we really need to do to ratchet up scrutiny of teachers, in New York and nationwide, is to take observations of their work with students in classrooms far more seriously. The typical teacher evaluation in public education today consists of a single, fleeting classroom visit by a principal or other building administrator untrained in the process, wielding a checklist of classroom conditions and teacher behaviors that often don’t even focus directly on the quality of teacher instruction (being presentably dressed, for example).

Not surprisingly, these drive-by evaluations are mostly meaningless. A recent study of the Chicago school system by the nonprofit New Teacher Project found that 88 percent of the city’s 600 schools did not issue a single “unsatisfactory” teacher rating between 2003 and 2006, including 69 schools deemed by the city to be failing educationally. To their credit, Chancellor Klein and his deputy are trying to address this educational malpractice.

But we need to strengthen evaluations of teachers’ classroom work, not merely work around them. To get a fuller and fairer sense of performance, evaluations should focus on teachers’ instruction—the way they plan, teach, test, manage, and motivate.

Evaluations should be based on clear, comprehensive standards of strong teaching practice that have emerged in recent years. And they should encompass multiple observations by multiple evaluators, with a substantial role going to teams of trained school system evaluators free of the inclinations to favoritism and conflicts of interest that plague evaluations by principals—and that led to the rise of credential- and seniority-based pay scales in public education 80 years ago.

Evaluations should be based on clear, comprehensive standards of strong teaching practice that have emerged in recent years.

Such evaluations are more labor-intensive, and thus more expensive, than principal drive-bys or evaluations based on test scores, and so they are tougher to implement for administrators trying to bring about change on the scale required in large urban school systems. But it’s an investment worth making, because teacher evaluation has a larger role to play than merely weeding out bad teachers.

Comprehensive evaluation systems focused on improving teachers’ performance signal to teachers that they are professionals doing important work, and in so doing help make public school teaching more attractive to the sort of talent that the occupation has struggled to recruit and retain.

As one measure of the importance of creating a more professional working environment in teaching, Public Agenda and the National Comprehensive Center for Teacher Quality found in a national survey of public school teachers last year that, if given a choice between two otherwise identical schools, 76 percent of secondary teachers and 81 percent of elementary teachers would rather be at a school where administrators supported teachers strongly than at one that paid significantly higher salaries.

Most teachers and their unions reject the idea of being judged—or paid—individually on the basis of their students’ test scores. The United Federation of Teachers, which represents 74,000 New York City teachers, has vowed to fight Chancellor Klein’s plan “on all grounds—educational, legal, and moral.”

But in schools that combine test-score calculations with classroom observations that go far beyond today’s superficial principal checklists, opposition to including student test scores in teacher ratings drops off dramatically.