Showing posts with label standardized exams. Show all posts
Showing posts with label standardized exams. Show all posts

Thursday, November 22, 2012

Data Driven Nonsense from Harvard and the Gates Foundation



A new Gates-funded Harvard study has found that Los Angeles Unified School District (LAUSD) teachers vary substantially in quality (more than in other districts) and that it disproportionately places inexperienced teachers in lower performing classrooms (as in other districts). The study, Human Capital Diagnostic, was done by the Strategic Data Project (SDP), which is connected with Harvard University’s Center for Education Policy Research.

The biggest problem with this study is that it is a bunch of nonsense.

Let’s start with the authors’ most profound claim: The best teachers in LAUSD provide the equivalent of eight additional months of instruction during the school year compared with the district’s worst teachers. Since their research was based entirely on student scores on the California Standards Tests (CSTs), a high-stakes exam used to rank schools, and the top teachers in the study were the ones with the largest student gains on these tests, what they are really saying is that the best teachers provided the equivalent of eight additional months of test prep.

Big wow!

The authors state that there is “no specific cut-off for determining whether an effect size is large or small,” but they assert that a standard deviation of 0.2 is considered large in education research. The study found that the difference between a 25th and a 75th percentile teacher is one-quarter of a standard deviation (0.25). This would be significant if it was based on a meaningful measurement of teacher effectiveness. Unfortunately, all it really says it that some teachers are better than others at squeezing out student gains on an otherwise lousy exam. It does not tell us whether their students are becoming self-motivated, independent learners or competent critical thinkers and problem-solvers. Furthermore, the study provided no explanation for how it determined that 0.25 standard deviations was equivalent to eight months of instruction.

The authors also claim that Teach for America (TFA) and Career Ladder teachers have higher effects on their students than other novice teachers by 0.05 and 0.03 standard deviations and they even attributed a gain of one to two months in additional learning to these relatively small standard deviations. They make similar claims for National Board Certified teachers, whose students test gains were 0.03-0.07 standard deviations higher than those of other teachers. Yet, if a standard deviation of 0.2 is considered large in education, then a standard deviation of 0.03-0.07 ought to be considered small or even insignificant.

While the standard deviation may be insignificant, the fact that this was being researched in the first place is not. TFA provided 13% of new hires to the district over the past six years (according to the study’s authors) and it would be of great interest to the district’s administrators to show that the investment was worthwhile. So let’s assume for the sake of argument that the difference between TFA recruits and other novice teachers was significant. What would this mean? TFA teachers may in fact be more willing than other novice teachers to work long, unpaid overtime hours and substitute quality student-centered instruction for “drill and kill” style teaching, both of which could produce higher test scores without improving the quality of student learning.

Perhaps a bigger problem with this study (like all studies and reforms based on student test data) is that numbers are not the only relevant type of data in education and sometimes not even the best. Ester Quintaro, writing for the Shanker Blog, talks about the “streetlight effect,” from the parable of the drunk who searches for his lost wallet under the streetlight, not because he lost it there, but because the light is better there and it would be easier to find it if it happened to be there. Student test data is easy to access now that it is required of every district in the U.S. under No Child Left Behind (NCLB)—it is under the streetlight.  Yet, at best it is only a proxy or very rough estimate of teacher quality since it only considers a small part of what teachers are expected to do.

Quintara also correctly points out that NCLB has helped to institutionalize what counts as data. “Scientifically-based research” is now limited to standardized test scores, which, as it turns out, are not particularly scientific. Case studies, ethnographies, teacher observations and portfolios, and other qualitative data are considered unacceptable.

One promising finding from the study was that teacher performance after two years was found to be a good predictor of future effectiveness. In other words, the current system of giving tenure to teachers after two years of good evaluations makes sense. Teachers are not getting worse after two years. Novice teachers are not better than veterans and should not have the right to bump them during layoffs and LAUSD is not top heavy with a bunch of cranky veterans who can no longer teach.

Monday, October 22, 2012

LAUSD Lays Down Hammer on Evaluations

Image from Flickr by r8r

Los Angeles Unified School District (LAUSD) has filed a declaration of impasse, according to the Daily News, after failed negotiations with United Teachers of Los Angeles (UTLA) over a new teacher evaluation system based on student test scores. LAUSD is under court order to revise its evaluation system by December 4. However, Superior Court Judge James Chalfant has mandated that the district negotiate with the union over the new system.

UTLA President Warren Fletcher said last week that his 40,000-member union was engaged in "good-faith bargaining with LAUSD officials over developing a fair and effective teacher evaluation system.” This, of course, is typical union mumbo jumbo meant to convince the public that the union was playing by the rules and trying to do right by the students and that any blame for the stalemate lies squarely on the shoulders of LAUSD.

While Fletcher’s quote may sound good to the press, it is patently untrue. If UTLA was really interested in creating a fair and effective teacher evaluation system they would refuse to accept any use of student test data in their evaluations, as such data is unreliable, inconsistent and leads to many false positives and negatives (see here, here and here). This is obviously bad for teachers who could receive bad evaluations despite being good teachers simply because they work in a low income school with the perennially low test scores that are common in lower income schools. However, it is also bad for children in several ways. They could end up losing excellent teachers because of the inaccuracies inherent in this evaluation system. Conversely, bad teachers could easily slip through the cracks and remain in the classroom because they happen to work in higher income schools, which tend to have higher test scores and larger gains on their scores.

If UTLA and LAUSD were truly interested in a fair and effective evaluation plan they would demand that well-trained outside evaluators be brought in to evaluate teachers blindly, using a combination of classroom observations and portfolios. This would eliminate the bias inherent in being evaluated by the boss (i.e., site administrators), who may ding a teacher for not embracing and carrying out his pet projects and reforms with sufficient vigor or for speaking out on children’s or teachers’ behalf at faculty or board meetings. It also would eliminate the problem of site administrators being poorly trained and lacking the time to make sufficient and competent observations and evaluations. And it would eliminate the bias and problems inherent in the use of student test data.

That LAUSD is declaring impasse suggests that they are fed up with UTLA’s position on the matter. Yet UTLA, despite Mr. Fletcher’s criticisms, has embraced student test data to evaluate its teachers. The big stumbling block, at this point, is that LAUSD wants the data to be based on individual classrooms, which can be directly linked to individual teachers, whereas UTLA wants it aggregated school-wide.

The union is also saying that it wants evaluations that provide useful feedback for teachers so they can improve their practice. Yet regardless of how student test data is acquired or aggregated, it fails to provide such data. This is because the test scores are a measure of student test taking ability. They tell us nothing about how students learned the content or developed their test taking skills and their scores are influenced far more by their socioeconomic status than by their schools and teachers.

UTLA, having already accepted the use of student test data, is unlikely to strike over the matter, especially when the district is under court mandate to include student test data in its new evaluation system. Unions have become overwhelmingly averse to challenging court orders and injunctions (e.g., the Chicago Teachers Union, which supposedly struck over student test data being used to evaluate teachers was, in reality, only fighting over the extent to which it would be used, having already  accepted that it was required by Illinois state law). Thus, the question is not whether, but how, student test data will be abused to evaluate teachers.

Thursday, October 18, 2012

LAUSD Wins Large Merit Pay Grant

Huck/Konopacki Labor Cartoons

What could be more appealing than the idea that if you work hard, play by the rules, and generally excel at what you do, you will be rewarded with higher pay, status and power? This, of course, is known as meritocracy and it is something that most Americans believe prevails in this country. And why not, it seems perfectly fair and reasonable? People shouldn’t get ahead by cheating or because they know the right people or because they have the money and power to game the system. 

Of course it is easier and more appealing to believe in meritocracy than to accept the reality that wealth begets more wealth and that few people ever transcend the socioeconomic status of their parents. The sad fact is that you can play by the rules, work your butt off, perhaps even kiss up to the boss, and still make little, if any, progress up the ranks in status or income.

Nevertheless, free market education reformers have been crying for years that the educational sky is falling, and that merit pay would encourage the competition necessary to make public education profitable, er successful. Thus, the Obama Administration has been giving away millions of dollars to school districts willing to implement this “reform.”

Los Angeles Unified has won the lion’s share of these grants in the form of a five-year, $49.2 million award from the Teacher Incentive Fund, a Department of Education program, to Daily News reports. The fund doled out a total of $290 million to 35 recipients in all. In addition to LAUSD, there were three L.A. area charter school networks that received federal funds for implementing merit pay schemes: Aspire Public Schools ($27.8 million), Green Dot ($11.7 million), and Alliance College-Ready Public Schools ($8.9 million).

As with meritocracies, in general, Merit Pay for teachers does not result in better societal or educational outcomes, nor does it ensure that the rewards go to the best teachers. For example, a mediocre teacher who happens to work in an affluent school may see large gains in her students’ standardized test scores through no fault of her own, while a superb teacher in a lower income school might work 80 hours per week, make home visits, offer weekend tutoring and still see declines in test scores. Consequently, a teacher could receive a merit bonus even though she did not work any harder or teach any better than her colleagues.

Aside from student test scores, which are nominally objective, teachers are still evaluated by their administrators, which is incredibly subjective and biased. Administrators tend to be poorly trained and lack the time to make sufficient and competent observations and evaluations of their teachers. Furthermore, administrators are often biased in favor of teachers who share their philosophies, experiences, goals and pet projects, which could lead to a more positive evaluation and merit raise for a mediocre teacher simply because he has kissed up to the principal, volunteered on the principal’s favorite committee or agreed to pilot one of her pet reforms.

Since there really is no accurate, consistent or reliable way to quantify teaching quality or to determine who deserves merit pay, teachers’ unions and school districts have, until recently, mutually agreed to contracts that pay teachers according to their years of teaching experience and additional education beyond their teaching credentials. This prevents districts from paying people more based on their gender, ethnicity or willingness to kiss up to their bosses.

In addition to creating an unfair and unaccountable system of rewards, the LAUSD grant will be used to encourage overwork and stress among teachers. Indeed, Superintendent John Deasy said the money will be used, in part, to ". . . develop teacher leaders without teachers having to leave the classroom, and principals can develop new leaders in their schools." In other words, some teachers will get bonuses for working additional hours beyond their normal school day to become “leaders,”—a code word which generally means shock troops for pushing other “reform” efforts.

Though they may get compensated monetarily for the extra work, they will not be compensated with extra time, which is what they will really need if they are to continue providing the best quality teaching for their students. Any extra responsibilities necessarily eat into the time teachers spend designing and preparing new lessons; meeting with parents, students and colleagues; attending meetings; and reading essays, lab reports and projects. Overwork does not necessarily translate into better teachers or students.

Thursday, September 20, 2012

Is Everything We’re Teaching Garbage?


Artificial intelligence theorist and education reformer Roger Schank has argued that virtually everything we currently teach to kids is a waste of time, according to Good Education.

This provocative statement should, of course, be taken with a grain of salt. For one, Good is calling him a “reformer” which, to many teachers, is an epithet most commonly applied to people with little or no training in education who have an ulterior motive (often profit-driven). Second, being an artificial intelligence theorist makes him no better qualified to critique education than any other non-educator.

Nevertheless, he brings up several salient points. For example, he argues that much of what we teach kids (or how we teach it) seems irrelevant to their everyday life. This is often the case, and it has only gotten worse with the mania for accountability and standardized exams, which has led to increasing use of teacher centered lessons and test preparation at the expense of engaging, inquiry-based lessons. Many districts have even gotten rid of science, arts and music to make room for even more test preparation.

However, Schank is not simply criticizing teaching to the test. Even traditional subjects that have held a sacrosanct position in schools’ course offerings are a load of malarkey in his mind. For example, he has called chemistry "a complete waste of time," arguing that no one really needs "to know the elements of the periodic table" or the "formula for salt," including doctors, who he incorrectly says do not use the chemistry they learned in college.

This is a ridiculous assertion. Doctors use their chemistry daily when considering which drugs to prescribe and how they might interact with other drugs the patient might be taking. Understanding how Prilosec helps alleviate digestive problems, for example, requires an understanding of acid/base chemistry as well as the biochemistry of protein channels and enzymes.

A basic understanding of chemistry has important day to day applications, even for people who never take another science course in their lives. It is applicable to cooking, maintenance of common equipment (e.g., cars), health and safety. For example, an understanding acid/base chemistry can prevent serious accidents at home or work when working with common cleaning materials, while a little biochemistry can go a long way toward understanding nutrition and diet, or the safety and proper usage of prescription and over the counter medicines.

One of the most important arguments in favor of chemistry is that it provides important prerequisite knowledge necessary for understanding much of the life sciences content standards, which have become very heavily weighted toward molecular biology and biochemistry over the past decade. Thus, if we are going to teach high school chemistry, it should come before biology. Unfortunately, few schools teach science in this sequence. Schank, by the way, supports the continuation of biology as a high school course, as long as we change how it is taught, which I favor, too.

On the other hand, the California content standards for chemistry require that students learn far more detail than most people will ever need. Unless continuing on to study higher level sciences, for example, no one really needs to know electron orbital configurations or Le Chatelier’s principle. By removing some of these more detailed concepts from the standards, teachers could have more time and freedom to implement curriculum that covers standards that are more relevant to everyday life and in a way that directly ties the content to real world problems.

Another problem is that science is often taught as a serious of facts, rather than a process of inquiry. While many of the facts are indeed important, what really makes science useful (and fun) is its ability to answer questions and make accurate predictions about natural phenomena, something than can and should be taught at school. Sadly, this is rarely the case in K-12 science education. Most science teachers, when they assign lab activities at all, rely heavily on “cookbook” or proof-of-concept activities and demonstrations in which the results are known beforehand. Students are rarely allowed to generate original data, develop their own testable research questions or design their own experiments. It is likewise rare that science teachers have students peer review each other’s lab reports or read and critique articles from scientific journals and popular science magazines, something that can hone both their literacy and critical thinking skills.

One important reason for continuing to teach science is that scientific thinking and analysis can effectively be applied to many nonscientific situations. However, this is only true when science teachers spend time teaching the scientific process, how to analyze data and create graphs, control variables and design good experiments. For example, the news media often publish sloppy graphs and data that poorly formatted, missing pertinent information or lacking a thorough description of the method of data acquisition. Someone who has had a good science education ought to be able to catch these problems and recognize that the conclusions drawn from such data might be inaccurate or exaggerated. In contrast, those lacking such training may buy all sorts of social snake oil, like the notions that poverty doesn’t affect student academic outcomes or that student standardized test scores are an effective and accurate way to evaluate teachers.

Schank correctly points out that the history textbooks are full of untruths. In reality, though, all subjects taught in school are subject to the biases of the ruling elite to some extent. However, this is most apparent in history and social studies courses. That does not mean that history should not be taught. Teachers do not have to use the textbooks at all or they can use the texts and add their own commentary. They can use the portions they find accurate and useful. They can even use the inaccurate parts to teach their students about bias.

One of Schank’s beefs with history is that U.S. presidents keep repeating the mistakes of the Vietnam War. By this we can presume he is talking about the conflicts in Iraq and Afghanistan. However, this criticism belies his own misunderstanding of both history and politics. Politicians did indeed learn from the mistakes of Vietnam: Don’t have a draft; Do most of the killing from the air to minimize troop casualties; Subcontract most of the work out to private contractors;  Keep independent journalists away; and Do as much of the dirty work as possible in secret. On the other hand, why should politicians give a damned about history? Even if they can’t “win” outright, wars are still profitable and they still help maintain our geopolitical dominance. But this interpretation of history will never make it into the history books as it conflicts with the myth that America is the world’s greatest proponent of democracy and freedom.

Wednesday, September 19, 2012

Research Shows: Evaluating Teachers Based on Student Tests is Bad for Students



“. . . two years ago, EPI [Economic Policy Institute] assembled a group of prominent testing experts and education policy experts to assess the research evidence on the use of test scores to evaluate teachers. It concluded that holding teachers accountable for growth in the test scores of their students is more harmful than helpful to children’s educations. Placing serious consequences for teachers on the results of their students’ tests creates rational incentives for teachers and schools to narrow the curriculum to tested subjects, and to tested areas within those subjects. Students lose instruction in history, the sciences, the arts, music, and physical education, and teachers focus less on development of children’s non-cognitive behaviors — cooperative activities, character, social skills — that are among the most important aims of a solid education.

There are many ed “reformers” who argue that the above losses are negligible or insignificant or that holding teachers accountable for their students’ standardized test scores trumps all else. In response, Rothstein debunked several prominent studies that these reformers have used to support their position. In the first of these studies, the Gates Foundation found a positive correlation between teachers who earned high Value Added Measures (VAM) scores on both their students’ standardized basic skills tests as well as internal tests of reasoning, suggesting that the narrowing of the curriculum as a result of high stakes testing was not impairing students’ development of reasoning skills, nor their teachers’ abilities to teach these skills. Rothstein’s take on this study follows:

But although the teacher results were correlated, they were only weakly correlated. True, more teachers who had high value-added scores on a basic skills test also had high value-added scores on a test of reasoning, but it wasn’t many more. If you fired teachers who did poorly at teaching basic skills you would get rid of many teachers who did poorly at developing reasoning skills, but you would also get rid of many teachers who did well at developing reasoning skills. The first group (those who did poorly) would be larger than the second group (those who did well), but not much larger.

A second well-known study, done by a group of Harvard researchers, found that teachers whose students had high value-added test scores also had better long-term adult outcomes like higher incomes. If this is true, it would mean that the tests are somehow correlated with financial success, something that many parents and the public at large would likely support. However, according to Rothstein:

The flaw here is that the researchers were unable to compare the long term results of high value-added teachers with results of teachers who excelled in other ways that might, conceivably, have even larger impacts on long-term outcomes. For example, the researchers could not say whether teachers who are more effective at developing their students’ cooperative behavior, or reasoning skills (and we know from the Gates study that only sometimes are these the same teachers who are more effective at teaching basic skills) might have students who have even better adult outcomes—like earnings. If this were the case (and we have no reason to believe it one way or the other), then getting teachers to shift their attention from teaching reasoning or cooperative behavior to standardized test preparation might be lowering their students’ future earnings, not raising them.


Now that more and more data are coming out refuting or drawing into question the validity of Value Added Measures due to their unreliability and inconsistency (see here, here and here), some “reformers” are backing off on demands to tie teacher evaluations entirely to student test data, instead calling for the “reasonable” compromise of using “multiple” measures that include the test data.

This, too, is bunkum. If the test scores are unreliable, then they should not count at all. If the emphasis on tests is bad for students in the long-term, they should be abandoned.

But just for the sake of argument, let’s assume that a tiny 1% of teachers’ evaluations are based on their students’ test scores. Even this seemingly infinitesimal amount can have an enormous impact on how a teacher is evaluated. For example, the most common and traditional method for evaluating teachers is for an administrator to go into the classroom and observe the teacher. In many school districts, this still accounts for the majority of a teacher’s evaluation. Yet, if an administrator knows her teachers’ VAM scores beforehand, she would be biased when making her observations, thus undermining the validity of the bulk of the evaluation, not just the 1% directly linked to student test scores.