From the Exploring Data website - http://curriculum.qed.qld.gov.au/kla/eda/
© Education Queensland, 1997

Simpson's Paradox

The Ashes Series

The two Waugh brothers, Steve and Mark, decided to have a little wager on who would have the better overall batting average over the two upcoming Ashes Test series, the first in England and the next here in Australia.

After the first Ashes series finished Steve said to Mark, ‘You’ve got your work cut out for you, mate. I have scored 500 runs for 10 outs, for an average of 50. You have 270 runs for 6 outs, for an average of 45.’

After the second Ashes series, Steve said, ‘Ok, mate, pay up. In this series I scored 320 runs for 4 outs, an average of 80, while you had 700 runs for 10 outs, which is only an average of 70. I topped you in each of the Series.’

‘Hold on’, Mark said, ‘The wager was for the better batting average overall, not series by series. As I reckon it, you have scored 820 runs for 14 outs, and I have scored 970 runs for 16 outs. My trusty calculator tells me your average is 58.6, while my average is 60.6. A clear case, old son, of being pipped at the bails.’

How is this possible, that Steve could have a better average in each of the two Tests but a lower average overall?

Our Benevolent Government

Consider this scenario, which is a simplified version of our federal tax system. A government taxes people at two rates. All income below $25 000 is taxed at the low rate of 20% and income above $25 000 is taxed at the high rate of 40%. Being a kind and generous government they decide to give the people a tax cut. The low rate is cut to 15% and the high rate is cut to 35%. Imagine the surprise of the government when the overall tax rate turned out to be higher than before!

What the government forgot was to take into account inflation (its easy to forget about these things when you are busy governing). Even though the tax rate for each group was lower, the amount of money taxed at the higher rate had increased so much because of inflation that it more than offset the money lost by cutting tax rates. (This is called ‘bracket creep’. Occasionally the public misuses this term, and mistakenly applies the word ‘creep’ to the tax collector instead.)

Of course once the government realised their error they immediately returned this extra income back to the people.

Simpson's Paradox Explained

The above scenarios are examples of Simpson’s Paradox. Here is David Moore’s definition.

‘Simpson’s Paradox refers to the reversal of the direction of a comparison or an association when data from several groups are combined to form a single group’

Moore, D.and McCabe, G. Introduction to the Practice of Statistics, p190.

The second scenario probably best demonstrates how this can occur. Even though the tax rate for each group had been reduced, the overall tax rate increased because the higher rate applied to a greater percentage of the income.

John Martin has a nice visual way of thinking about the Ashes Series paradox (actually these really aren't paradoxes, rather they are just surprising results, but ones that can be fully explained). A vector diagram nicely illustrates the problem (this diagram is not to scale).

The gradient of OC gives Steve’s batting average (runs/outs) in the first Ashes series while the gradient of OA gives Mark’s batting average. Note that OC is steeper than OA, so OC represents a higher average. Similarly CD is steeper than AB, so Steve’s average in the second series is higher than Mark’s. But OB is steeper than OD, which shows that Mark’s overall average is greater. A vector diagram can be useful in determining the conditions that give rise to Simpson's Paradox.

Another Cricket Story

The numbers in this story are a bit outrageous, but the outcome is even more surprising.

*****

Lillee and Thommo had a great series against Upper Monrovia as they had each taken 28 wickets for only 60 runs, with only one more innings to play. Lillee says to Thommo, ‘OK, mate, $50 says that I’ll be the better bowler for the series.’

‘You’re on, mate.’ says Thommo.

In the last innings, Lillee took 5 wickets for 40 runs while poor old Thommo was 0 wickets for 24 runs. Who won the bet?

Well, Lillee’s series average is now 3.03 runs per wicket while Thommo’s average is 3.00 runs per wicket! Since the lower average is better, Thommo has won the bet by not taking any wickets!

Ask Marilyn

Here is another example of Simpson’s Paradox, from the Ask Marilyn column of Parade Magazine, 28 April 1996, p6.

A reader poses the following question:

A company decided to expand, so it opened a factory generating 455 jobs. For the 70 white collar positions, 200 males and 200 females applied. Of the females who applied, 20% were hired, while only 15% of the males were hired. Of the 400 males applying for the blue collar positions, 75% were hired, while 85% of the females were hired.

A federal Equal Employment enforcement official noted that many more males were hired than females, and decided to investigate. Responding to charges of irregularities in hiring, the company president denied any discrimination, pointing out that in both the white collar and blue collar fields, the percentage of female applicants hired was greater than it was for males.

But the government official produced his own statistics, which showed that a female applying for a job had a 58% chance of being denied employment while male applicants had only a 45% denial rate. As the current law is written, this constituted a violation....Can you explain how two opposing statistical outcomes are reached from the same raw data?

What we have, of course, is an example of Simpson's paradox: The direction of association between gender and hiring rate appears to reverse when the data are aggregated across job classes. Marilyn correctly notes that, even though all the figures presented are correct, the two outcomes are not opposing.

She also presents her own analogy.

Say a company tests two treatments for an illness. In trial No. 1, treatment A cures 20% of its cases (40 out of 200) and treatment B cures 15% of its cases (30 out of 200). In trial No. 2, treatment A cures 85% of its cases (85 out of 100) and treatment B cures 75% of its cases (300 out of 400)....

So, in two trials, treatment A scored 20% and 85%. Also in two trials, treatment B scored only 15% and 75%. No matter how many people were in those trials, treatment A (at 20% and 85%) is surely better than treatment B (at 15% and 75%), right?

Wrong! Treatment B performed better. It cured 330 (300+30) out of the 600 cases.

(200+400) in which it was tried--a success rate of 55%...By contrast, treatment A cured 125 (40+85) out of the 300 cases (200+100) in which it was tried, a success rate of only about 42%.

She notes that this is exactly what happened to the employer. Because so many more men applied for the blue collar positions, even if the employer hired all the women who had applied for blue collar positions, it couldn't satisfy the government regulations.

Simpson's Paradox in Real Life

Simpson’s Paradox actually arises in real-life situations. The National Science Foundation in the US conducted a study of persons who received a degree in science or engineering in 1977 or 1978. The study found that at the bachelor’s degree level the average woman with a full-time job earned an average of 77% of the average male salary. But comparing salaries within each field, the average salary for women was in each case at least 92% of the average male salary. The explanation here is what is called a lurking variable - women were concentrated in the life sciences and social sciences which had lower salaries in general.

Here is a real-life example of apparent gender inequity in admitting students to graduate study at Berkeley University which turns out to be a case of Simpson's Paradox. You may wish to visit the website Simpson's Paradox for further information.