Sunday, July 15, 2012

Five hidden facets of numbers

We live in a world flooded with numbers. News stories, nutrition labels, results from research studies, and even sports highlights are riddled with numbers! Numbers are comforting; they seem solid and dependable.

“The numbers don’t lie.” But what if they do? Numbers can be manipulated to tell contradictory stories. Here are five things to think about next time you see a reported number.

1. Error Margin
2. Sample Size
3. Sample Bias
4. Replication
5. Rounding Errors

1. Error Margin
Scientists obsess over error margins. One type of this error comes from our measuring tools. All measurements have uncertainty because we do not have infinitely precise tools.

Imagine that you have a foot-long rule, with no inches marked. Any measurement will only be precise up to the number of feet; the number of inches will be a guess. A 5.5 foot height would be reported as 5.5 feet plus or minus 0.5 feet.

Unfortunately, these error margins are often neglected when a number is quoted.

2. Sample Size
One out of one writers agree that sample size is important.

Scientists use large sample sizes to account for the individual differences between people or animals.

Imagine you take your foot-long ruler to find the average American height by measuring the heights of the first ten people you meet. You find the average height to be 6.25 feet. The danger occurs if a headline reads "The average person is 6.25 feet tall!" without mentioning the small sample size.

Humans love seeing patterns. We tend to use one example as evidence for some larger truth: “my neighbor was sick, but then she ate some sunflower seeds and got better. Hey, sunflower seeds cure sickness!”

The falsity of this pattern is summarized beautifully here: “The plural of anecdote isn’t data.”

3. Sample Bias
General claims are only useful if they are based on samples representative of the whole population.

One famous example: researchers are now realizing that heart attack signs in women are not the same as in men (e.g., pain in the chest). Most heart attack studies were done using all males. The typical signs don’t even apply to half of the population.

Sample bias is also important for studies that take volunteers. The results of a phone poll will be skewed by the fact that certain types of people are more likely to be at home to receive the pollster's calls.

Any time you see a number referring to some population result, try to find out the identities of the subjects.

4. Replication
Sometimes, a number seems too good to be true. One question to ask is whether it can be reproduced.

Let’s use a political race as an example. Five independent polls take place. Four of them find that candidate A has a big lead, but one states that candidate B has a big lead. Looking at all the results, you conclude the fifth poll was erroneous. But, if you only had the result from the fifth poll, then you do not have an accurate reading of the political race.

This phenomenon most commonly occurs when one lab publishes a sensational scientific result, such as the autism-vaccine connection.

Be excited about new results, but treat them with a healthy amount of skepticism.

5. Rounding Errors
Spray butter: a delicious condiment used on corn-on-the-cob or baked potatoes. A friend told me once that she could eat as much spray butter as she wanted. The reason: the nutrition label stated that there were zero calories.

Why does such a statement exist? If there are fewer than five calories in a serving, the FDA allows a company to label that as zero calories per serving.

The culprit in this case is a rounding error. If the spray has 3 calories, then one serving is rounded down. Five sprays, however, contain 5*3=15 calories, not 5*0=0 calories.

Errors propagate every time you round a number so scientists only round at the last step of a problem.

These five features of numbers should help you navigate the number-filled world we live in.

1 comment:

  1. Woot, first comment. Very nice. I would add that even given well collected, processed, and annotated data, error often occurs in the human being interpreting the data into false causation. I would go as far as to say that this is the most common method used to manipulate the outcome of a test when there is a strong emotional investment in the results.