From the Exploring Data website - http://curriculum.qed.qld.gov.au/kla/eda/
© Education Queensland, 1997

I'm Not Mad About MAD

For years I've shuffled around when students have asked me, 'Why don't we just use the mean deviation rather than the standard deviation. Its so much easier to understand.' And when asked to explain why we divide by n-1 rather than n when finding the variance of a sample, I've mumbled things about degrees of freedom without having the foggiest idea what I was talking about. But no longer!

Paul Gardner of Monash University posted an email to the edstat-l mailing list which provides clear explanations of both of these topics. I have made some minor editing changes to Paul's email, as the audience for this article is not the audience of Paul's original post.

The original question

The books say that the Mean Absolute Deviation (MAD) doesn't lend itself to further use making inferences about the underlying population, but that the Variance does. Why is that?

From: Paul Gardner

This is not a proof, but perhaps this exercise may be convincing.

Consider the population of numbers on one die, 1 2 3 4 5 6. The mean is 3.5, the variance is 2.917 and the mean absolute deviation is 1.5.

Now take two dice, and consider all possible samples of size 2. There are 36 possible combinations, which form the sampling distribution for N=2 taken from the original population. The combinations are 1,1; 1,2; 2,1; 1,3; 3,1; 2,2 ... etc.

Now find the mean absolute deviation within each sample of the individual dice values around their sample mean. Here are two examples. For 1, 1, the mean is 1, each deviation is 0 so the mean absolute deviation is 0. For 4,1, the mean is 2.5, each deviation is 1.5 so the mean absolute deviation is 1.5.

Doing this for all 36 combinations yields a frequency distribution of 6x0, 10x0.5, 8x1, 6x1.5, 4x2, 2x2.5.

The mean of this distribution is 35/36 i.e. 0.97. Thus the sample deviations yields an underestimate of the true value of the population deviation of 1.5. Statisticians say that the sample MAD is a biased estimator of the population MAD. MAD doesn't lend itself to making inferences about the underlying population.

What about Variance?

If one calculates variances instead of deviations, using SS/n (SS=sums of squared deviations, n=2), the average of the 36 sample values comes to 1.458, also an underestimate of the true value of 2.917, i.e. SS/n is a biased estimator of the population variance.

However, if one calculates SS/(n-1), the value obtained is exactly 2.917. SS/(n-1) is an unbiased estimator of the population variance. This is a general finding which can be proven algebraically.

Can we try a similar trick with MAD? Doing the equivalent with the MAD (dividing the sample values by n-1 instead of n) yields a value of 1.94, an overestimate of the true MAD value.

In short, variance is a more powerful concept than MAD, because predictions about population parameters can be made from sample data.

And, like those steak knives, there is even more: there is a theorem, the variance theorem, which shows that variances of independent (uncorrelated) variables are additive. This powerful idea underpins regression and analysis of variance. MAD is not additive, and hence it is a much less useful concept in the structure of statistics. Hope this helps.