Monday, December 24, 2012

Three C’s in Science: Correlation, Confound, and Causation

If you want to irritate a scientist, start a loud conversation about how two things are connected and therefore, you know that one causes the other. 

Two ice cream friends!

For example, below I show a fake chart of the number of clothing items worn against the number of ice cream scoops consumed.

 Graph of the number of clothing items worn vs. number of ice cream scoops consumed. Does this mean that not wearing clothing makes you more likely to eat ice cream?

As you can see, there is an apparent trend of eating more ice cream and wearing less clothing.

And as is famously said, “Correlation does not imply causation”.

Correlation occurs when two things occur together more often than chance. For example, ice cream and clothing or little kids and sticky hands.

In fact, it is pretty easy to find correlations between all sorts of random pairs of variables. For one famous example, check out pirates and global warming (http://en.wikipedia.org/wiki/Flying_Spaghetti_Monster#Pirates_and_global_warming). The claim is that diminishing numbers of pirates cause global warming.

However, two things can occur together for no reason or there can be a third variable.

This third variable is known as a hidden variable or confound (http://en.wikipedia.org/wiki/Confounding).

A confound is another variable that is actually the cause of the two variables. The confound is often a cause that you wouldn't have otherwise connected with the data.

In this case, the hot temperatures are most likely causing people to eat more ice cream and be more naked (both good things).

Confounds are a big deal for scientists because a confound can ruin all of your data if you don’t discover it before you publish your paper.

Many people do not think about confounds when talking about correlations. This does not mean that correlations are not useful, but rather that there is often some underlying cause connecting the variables. 

So when you hear reports about connections between two variables, keep in mind that the two variables might not cause one another. However, feel free to feign ignorance and piss off any annoying scientists near you.


No comments:

Post a Comment