Thursday, November 22, 2012

Data Driven Nonsense from Harvard and the Gates Foundation

A new Gates-funded Harvard study has found that Los Angeles Unified School District (LAUSD) teachers vary substantially in quality (more than in other districts) and that it disproportionately places inexperienced teachers in lower performing classrooms (as in other districts). The study, Human Capital Diagnostic, was done by the Strategic Data Project (SDP), which is connected with Harvard University’s Center for Education Policy Research.

The biggest problem with this study is that it is a bunch of nonsense.

Let’s start with the authors’ most profound claim: The best teachers in LAUSD provide the equivalent of eight additional months of instruction during the school year compared with the district’s worst teachers. Since their research was based entirely on student scores on the California Standards Tests (CSTs), a high-stakes exam used to rank schools, and the top teachers in the study were the ones with the largest student gains on these tests, what they are really saying is that the best teachers provided the equivalent of eight additional months of test prep.

Big wow!

The authors state that there is “no specific cut-off for determining whether an effect size is large or small,” but they assert that a standard deviation of 0.2 is considered large in education research. The study found that the difference between a 25th and a 75th percentile teacher is one-quarter of a standard deviation (0.25). This would be significant if it was based on a meaningful measurement of teacher effectiveness. Unfortunately, all it really says it that some teachers are better than others at squeezing out student gains on an otherwise lousy exam. It does not tell us whether their students are becoming self-motivated, independent learners or competent critical thinkers and problem-solvers. Furthermore, the study provided no explanation for how it determined that 0.25 standard deviations was equivalent to eight months of instruction.

The authors also claim that Teach for America (TFA) and Career Ladder teachers have higher effects on their students than other novice teachers by 0.05 and 0.03 standard deviations and they even attributed a gain of one to two months in additional learning to these relatively small standard deviations. They make similar claims for National Board Certified teachers, whose students test gains were 0.03-0.07 standard deviations higher than those of other teachers. Yet, if a standard deviation of 0.2 is considered large in education, then a standard deviation of 0.03-0.07 ought to be considered small or even insignificant.

While the standard deviation may be insignificant, the fact that this was being researched in the first place is not. TFA provided 13% of new hires to the district over the past six years (according to the study’s authors) and it would be of great interest to the district’s administrators to show that the investment was worthwhile. So let’s assume for the sake of argument that the difference between TFA recruits and other novice teachers was significant. What would this mean? TFA teachers may in fact be more willing than other novice teachers to work long, unpaid overtime hours and substitute quality student-centered instruction for “drill and kill” style teaching, both of which could produce higher test scores without improving the quality of student learning.

Perhaps a bigger problem with this study (like all studies and reforms based on student test data) is that numbers are not the only relevant type of data in education and sometimes not even the best. Ester Quintaro, writing for the Shanker Blog, talks about the “streetlight effect,” from the parable of the drunk who searches for his lost wallet under the streetlight, not because he lost it there, but because the light is better there and it would be easier to find it if it happened to be there. Student test data is easy to access now that it is required of every district in the U.S. under No Child Left Behind (NCLB)—it is under the streetlight.  Yet, at best it is only a proxy or very rough estimate of teacher quality since it only considers a small part of what teachers are expected to do.

Quintara also correctly points out that NCLB has helped to institutionalize what counts as data. “Scientifically-based research” is now limited to standardized test scores, which, as it turns out, are not particularly scientific. Case studies, ethnographies, teacher observations and portfolios, and other qualitative data are considered unacceptable.

One promising finding from the study was that teacher performance after two years was found to be a good predictor of future effectiveness. In other words, the current system of giving tenure to teachers after two years of good evaluations makes sense. Teachers are not getting worse after two years. Novice teachers are not better than veterans and should not have the right to bump them during layoffs and LAUSD is not top heavy with a bunch of cranky veterans who can no longer teach.

No comments:

Post a Comment