Dangerous Equation: The Standard Error of the Mean

On your reading list is the article: Wainer, H. (2007). The most dangerous equation.American Scientist, 95, 249-256.

Wainer opines that the most dangerous equation is that for the standard error of the mean, which you know is . Wainer agues that not understanding this expression leads to all sorts of mischief. How could such a simple expression be associated with mischief?

As you know, this dangerous equation indicates that a population of sample means will be more variable the smaller the size of each sample. Suppose that three very bored people took the time to obtain 1,000,000 random samples of the weight of college students. Person A had only one score in each sample. Person B had four scores in each sample. Person C had 1,000 scores in each sample. You would, I hope, be not at all surprised to learn that the variability of the sample means would be greatest for Person A and smallest for Person C. You would also not be surprised to see that Person A frequently obtained means that were very low or very high, but Person C only very rarely obtain means that were very low or very high, C’s means all being tightly centered about the population mean.

Now consider rates of kidney cancer in counties of the USA. Each of these rates is basically a proportion, the number of cases of kidney cancer in a county divided by the total number of persons in the county, and proportions are essentially means: random variable Y has value 0 if a person does not have cancer, Y has value 1 if a person does have cancer. The mean of Y is the proportion of the total sample that does have cancer. As shown by Wainer, if you identify the counties that have the lowest rates of kidney cancer in the USA, counties with low populations are over-represented. Why? Might there be something protective about living in a rural county?

Hold on now, this observation may be due to nothing other than our dangerous equation. Counties with small populations are just like samples with small n. Our dangerous equation would lead us to expect that compared to large counties the small counties are over-represented not only among those with very low rates of kidney cancer but also among those with very high rates of kidney cancer, and, sure enough, that is exactly what we find. It would have been a dangerous mistake to have concluded that the data on hand indicate that (with respect to kidney cancer) you are likely to be more healthy if you live in the boonies than if you live in the big city.

Wainer give other examples of how ignorance of the dangerous equation has led to faulty inferences. One is the observation that small schools are over-represented among those whose students score highest in tests of academic achievement, and the inference that it would be a good idea to break up large schools into smaller schools. The data, however, also show that small schools are over-represented among those whose students score lowest in tests of academic achievement, and that on average large schools out-perform small schools.

Karl L. Wuensch,

August, 2010.