Thursday, June 23, 2011

Data Driven Dribble (Statistics by Idiots)

Over the past decade there has been a growing obsession with using data to inform our decision-making, in everything from medicine to education. While this sounds scientific and superior to guessing or continuing to do things a particular way because “that’s the way we’ve always done it,” if the data is not collected and interpreted correctly, it ends up being just another way for the bosses to get what they want and to get us to go along with it.

GamesWithWords posted the following on Field of Science Blog

Congressman Jack Kingston explains that the nation's food supply is  "99.99 percent safe." Politifact says, "That sounds great, but is it true?"

Actually, it doesn't sound that good to me. Suppose Kingston means that you only have a 0.01% chance of getting ill any particular time you eat (which seems to be the case). And let's say people eat 3 times a day. That gives you a 10.4% chance of getting sick any given year. I'd rather not get sick at all, particularly when many of the illnesses are easily preventable.”

99.99% does sound great, which is precisely why people love to say it, despite the fact that they rarely have any evidence to support this number. They say it to convince us they are right, that everything is okay. How can one argue against odds like that? Yet GamesWithWords cleverly points out that even a percentage as high as 99.99% is not necessarily so good. It is not just that you still have a 10.4% chance of getting sick, which is relatively high, but that the consequences of getting sick can be so terrible. The consequences of losing a bet are just as important as the odds. A 10% chance of winning $2 from a $1 wager seems like a good bet in part because the consequences of losing are trivial. On the other hand, the chances of having a child with Down syndrome is 0.8% for mothers age 40-44, and 3.6% for mothers over the age of 45. The odds seem low, but the consequences are significant. 

There is also the issue of whether a particular statistic is even possible for a given variable. No Child Left Behind (NCLB) mandates that 100% of students become proficient in math and reading. On the surface this seems both desirable and achievable. However, when you consider that “proficient” is measured relative to other students, rather than as a benchmark, it becomes clear that this statistic is not achievable. If every student makes some improvement, the “proficient” mark keeps getting higher. Therefore, students who were below proficient, but who are still improving, will still be seen as failing under NCLB. Schools are required to make “adequate yearly progress” for all subgroups (e.g., ethnicity, ELD, special education, socioeconomic status) of students. Therefore, if all subgroups improve but one, the school may be deemed a failure.

How the data is acquired is also important. In education and other social science studies, researchers often use surveys to acquire data. There are numerous ways surveys can be abused, biased or manipulated to yield inaccurate interpretations of reality. For example, a random sample of subjects who are similar in age, ethnicity, socioeconomic status and health will usually yield more accurate results than one in which these and other variables are not controlled. Likewise, self-selecting surveys (i.e., participants choose whether to participate or not) tend to be more biased than simple random sampling surveys, in which all members of a group have an equal random chance of being selected for the survey. This is because certain groups may select themselves out of a self-selecting survey, thus removing their responses and beliefs from the results.

In a school I taught at, the administration gave a self-selecting survey to determine how the staff viewed them. There was already considerable mistrust of the administration and the survey was not entirely anonymous. Many teachers refused to participate in the survey. If these teachers felt abused, slighted or frightened, then the survey results would likely be skewed toward approval of the administration, giving the administrators a false sense that everything was okay.

Another way statistics can be abused is to read more into them than they really say. For example, a student survey at a school I taught at indicated that students felt rushed and harried by the frenetic pace of the school day. The administration used this statistic to justify changing the bell schedule. However, there are numerous ways to parse the data and not all of them would benefit from a change in bell schedule. For example, maybe students were fine with the existing bell schedule, but wanted their teachers to build in more time for review and questions and to process what they had learned. Maybe they wanted less homework, a longer lunch and longer passing periods.

The idea of using data to assess teachers has become particularly popular recently. It seems much more objective than having an administrator make subjective observations after a cursory classroom visit. However, what kind of data is available that could accurately and objectively assess a teacher’s ability. The one most often proposed is student test scores. If a teacher is doing a good job, then students’ scores should improve, right?

The problem with this is that student test scores do not measure teachers. They measure students. And students can do well or poorly on tests for numerous reasons that have nothing at all to do with their teachers. In fact, the most significant influence on student test scores is their family’s socioeconomic status. Wealth not only influences how well students do on tests and in school, in general, but also how much they will benefit from a given teacher. A student with low self-efficacy, malnourishment, physical discomfort from untreated maladies, a lack of a quiet place to study and no one to monitor and assist with studying, may have trouble in school regardless of a teacher’s skill. A teacher who has exceptional talent, but a lot of low income students who come to class sporadically, read far below grade level and who seldom study, could easily see stagnant student test scores and consequently appear to be a terrible teacher.

No comments:

Post a Comment