Wednesday, September 19, 2012

Research Shows: Evaluating Teachers Based on Student Tests is Bad for Students



“. . . two years ago, EPI [Economic Policy Institute] assembled a group of prominent testing experts and education policy experts to assess the research evidence on the use of test scores to evaluate teachers. It concluded that holding teachers accountable for growth in the test scores of their students is more harmful than helpful to children’s educations. Placing serious consequences for teachers on the results of their students’ tests creates rational incentives for teachers and schools to narrow the curriculum to tested subjects, and to tested areas within those subjects. Students lose instruction in history, the sciences, the arts, music, and physical education, and teachers focus less on development of children’s non-cognitive behaviors — cooperative activities, character, social skills — that are among the most important aims of a solid education.

There are many ed “reformers” who argue that the above losses are negligible or insignificant or that holding teachers accountable for their students’ standardized test scores trumps all else. In response, Rothstein debunked several prominent studies that these reformers have used to support their position. In the first of these studies, the Gates Foundation found a positive correlation between teachers who earned high Value Added Measures (VAM) scores on both their students’ standardized basic skills tests as well as internal tests of reasoning, suggesting that the narrowing of the curriculum as a result of high stakes testing was not impairing students’ development of reasoning skills, nor their teachers’ abilities to teach these skills. Rothstein’s take on this study follows:

But although the teacher results were correlated, they were only weakly correlated. True, more teachers who had high value-added scores on a basic skills test also had high value-added scores on a test of reasoning, but it wasn’t many more. If you fired teachers who did poorly at teaching basic skills you would get rid of many teachers who did poorly at developing reasoning skills, but you would also get rid of many teachers who did well at developing reasoning skills. The first group (those who did poorly) would be larger than the second group (those who did well), but not much larger.

A second well-known study, done by a group of Harvard researchers, found that teachers whose students had high value-added test scores also had better long-term adult outcomes like higher incomes. If this is true, it would mean that the tests are somehow correlated with financial success, something that many parents and the public at large would likely support. However, according to Rothstein:

The flaw here is that the researchers were unable to compare the long term results of high value-added teachers with results of teachers who excelled in other ways that might, conceivably, have even larger impacts on long-term outcomes. For example, the researchers could not say whether teachers who are more effective at developing their students’ cooperative behavior, or reasoning skills (and we know from the Gates study that only sometimes are these the same teachers who are more effective at teaching basic skills) might have students who have even better adult outcomes—like earnings. If this were the case (and we have no reason to believe it one way or the other), then getting teachers to shift their attention from teaching reasoning or cooperative behavior to standardized test preparation might be lowering their students’ future earnings, not raising them.


Now that more and more data are coming out refuting or drawing into question the validity of Value Added Measures due to their unreliability and inconsistency (see here, here and here), some “reformers” are backing off on demands to tie teacher evaluations entirely to student test data, instead calling for the “reasonable” compromise of using “multiple” measures that include the test data.

This, too, is bunkum. If the test scores are unreliable, then they should not count at all. If the emphasis on tests is bad for students in the long-term, they should be abandoned.

But just for the sake of argument, let’s assume that a tiny 1% of teachers’ evaluations are based on their students’ test scores. Even this seemingly infinitesimal amount can have an enormous impact on how a teacher is evaluated. For example, the most common and traditional method for evaluating teachers is for an administrator to go into the classroom and observe the teacher. In many school districts, this still accounts for the majority of a teacher’s evaluation. Yet, if an administrator knows her teachers’ VAM scores beforehand, she would be biased when making her observations, thus undermining the validity of the bulk of the evaluation, not just the 1% directly linked to student test scores.

No comments:

Post a Comment