Link to top Back of the Envelope

Blog
Writings About Me Photos
Links
Outliers in statistical data
Despite my best efforts, I've been having difficulty explaining, in my comments on my previous God or Not Carnival Thoughts post, why outliers make it difficult to see trends in data. Thus, I'll demonstrate by example. The following graph shows the number of murders per 100,000 for each state in the year 2000, and the percentage of the vote that Gore got in each state that year. As you can clearly see, voting for Gore causes more murders per capita.


"Wait a second!" you say. "There doesn't seem to be any pattern."

"But look at that one point, the one where 90% of the voters voted for Gore, and the murder rate is 42 per 100,000. Isn't that striking? See how clear it is!"

"But it's only one point!"

"But look how big it is!"

That's essentially what my argument with Athana has been like. She refers to a study in the Journal of Religion and Society that shows that religion is bad for nations. As proof, it shows the US, with its high rates of abortion, murder, teenage pregnancy, venereal disease, et cetera, versus a whole bunch of other first-world nations, in a graph where religious belief is the x-axis. The problem is, if you remove the US, which is way more religious than the other nations and also has more murders, abortions, teenage pregnancies, et cetera, most of the graphs don't show any discernable pattern. (The exceptions, which look very linear, are abortions, teenage pregnancies, and under 5 mortality, which is sufficient to get an interesting discussion, and also, I think, all symptoms of the same characteristic.) Thus my argument is that the US is too much of an outlier to include, as its differences from other nations is more than just religiosity: it lacks the social welfare programs, is more ethnically and culturally diverse, places higher value on individual freedoms as opposed to community conformity, the list goes on and on... A real trend in a set of data should survive the removal of any single point. If removing one point, or even a couple, eliminates the trend, then it isn't real.

Going back to my plot and its striking outlier: that outlier is Washington, D.C. "Wait a second," you say. "That's not a state." For the purposes of voting and crime statistics it is, which is why I included it. It also has a population higher than Wyoming. "But," you continue, "it's one large urban center, mostly poor and disadvantaged. It's an outlier!" And that, I believe, is my point.

Trackback URL for this post:
http://www.donaldscrankshaw.com/admin/trackbackdrum.pl?post=1137989439

Listed below are links to blogs or other websites which have notified this blog that they've posted something which links to Outliers in statistical data. This is an automatically generated list and the presence of any link on this list should not be construed as an endorsement of them.

God or not addendum

Excerpt: I've been remiss in not pointing out a couple of good posts over at Back of the Envelope about the entries in the God or Not carnival he hosted.

Blog: Doc Rampage

Tracked Back: Fri Jan 27 00:26:30 2006


Post as: [Register] [Log In]

Account:
Password:
Remember info?
I must approve every post before it goes up. I do not do this in order to prevent people from disagreeing with me, but merely as a way to control the comment spam. I typically let my readers say whatever they want, even if they want to insult me. I will edit out any pornography or profanity, though.