Text
Unit E4 Section 2
Box and Whisker Plots

The box and whisker plot is another measure of variation that enables you to illustrate and compare the measures of both location and variation through the median and quartiles.

For example, for the data set below, we can easily find the median and quartiles.

The box is formed by the two quartiles, with the median marked by a line, whilst the whiskers are fixed by the two extreme values, 4 and 15.

The plot is shown below, relative to a scale.

Box and whisker plots are particularly useful when comparing quickly two sets of data.

For example, if you wish to compare the data set above with the following data set,

then you can illustrate the two plots together. This is shown below – you can immediately see that the data in the second set are much more spread out than that in the first set.

Outliers

Outliers are extreme values in data sets and are often ignored as they can distort the data analysis. We make the concept precise by defining an outlier as 'any value which is either 1.5 times the interquartile range (IQR) more than the upper quartile (UQ) or 1.5 times the IQR less than the lower quartile (LQ). This is illustrated below.

Any outlier should be marked on the box and whisker diagram but the whisker should extend only to the lowest and highest values which are not outliers.

Worked Examples

1

The number of goals scored by the 11 members of a football team in 2007 were as follows:

6   0   8   12   2   1   2   9   1   0   11

We first put the number of goals in increasing order, i.e:

0   0   1   1   2   2   6   8   9   11   12

(a)

Find the median.

There are 11 data points, so the median is the

th value, i.e. the 6th value, which is 2.

(b)

Find the upper and lower quartiles.

The lower quartile is the

th value, i.e. the 3rd value, which is 1; the upper quartile is the 9th value, i.e. 9.

(c)

Find the interquartile range.

Interquartile range = 9 − 1 = 8.

(d)

Explain why, for this set of data, the interquartile range is a more appropriate measure of spread than the range.

The interquartile range is a better measure to represent the 'average' spread, rather than the range, as it excludes the outlying values.

(e)

The goals scored by the 11 members of the hockey team in 2008 are summarised in the box and whisker plot below.

(i)

On a copy of the diagram, summarise the results for 2007 in the same way.

(ii)

Do you think the team scored more goals in 2008? Explain your reasoning.

The team scored more goals in 2008; the median in 2007 is much lower.

2

The cumulative frequency curve represents the times taken to run 1500 metres by each of the 240 members of the athletics club, Weston Harriers.

(a)

From the graph, find:

(i)

the median time;

From the dashed line (120 on the vertical axis), the median is minutes or 5 minutes 15 seconds.

(ii)

the upper quartile and the lower quartile.

Similarly:
the upper quartile is minutes or 5 minutes 45 seconds,
the lower quartile is 4 min 48 sec (width of each small square is 3 seconds)

(b)

Draw a box and whisker plot to illustrate the data.

(c)

Use your box and whisker plot to make one comment about the shape of a histogram for these data.

The data are almost symmetric about the median.

A rival athletics club, Eastham Runners, also has 240 members. The time taken by each member to run 1500 metres is recorded and these data are shown in the following box and whisker plot.

(d)

Use this diagram to make one comment about the data for Eastham Runners compared with the data for Weston Harriers.

The data for Eastham Runners are skewed to the left, with a lower median time. Hence Eastham Runners' data are significantly better than Western Harriers' data, indicating relatively more athletes with faster times.

3

The ages (in years) of a group of people visiting a club are given below.

(a)

Identify any outlier.

First identify the median and upper and lower quartiles of the 27 data values.

Putting the data in increasing order gives:

The median is the

th value, i.e. the 14th value ⇒ 30

The lower quartile is the

th value, i.e. the 7th value ⇒ LQ = 27

The upper quartile is the

th value, i.e. the 21st value ⇒ UQ = 37

The interquartile range is 37 – 27 = 10.

Now check for outliers, remembering that the outliers must be less than

LQ – 1.5 × IQR = 27 – 1.5 × 10 = 27 – 15 = 12

or more than

UQ + 1.5 × IQR = 37 + 1.5 × 10 = 37 + 15 = 52

The only outlier is 76.

(b)

Illustrate the data using a box and whisker plot.

This box and whisker plot illustrates the data.

Exercises

A manufacturing company needs to place a regular order for components. The manager investigates components produced by three different firms and measures the diameters of a sample of 25 components from each firm.

The results of the measurements for the samples of components from Firm B and Firm C are illustrated in the two box plots shown below.

(a)
(i)

Find the range of the sample of measurement for Firm B.

(ii)

Find the interquartile range of the sample of measurement for Firm C.

The results of the measurements for the sample from Firm A are summarised as follows.

Median = 25.0 mm, lower quartile = 23.4 mm, upper quartile = 26.5 mm, lowest value = 22.5 mm, highest value = 27.3 mm.

Draw, on a copy of the grid above, a box plot to illustrate the sample results for Firm A.

(b)

The manager studies the three box plots to decide which firm's components he should use. The components he requires should have a diameter of 25 mm, but some variation above and below this measurement is bound to happen and is acceptable. Any components with diameters below 24 mm or above 26 mm will have to be thrown away. State which firm's components you think the manager should choose.

The weights, to the nearest kilogram, of 19 pigs were:

36  38  30  31  38  43  55  38  37  30  48  41  33  25  34  43  37  40  36

(a)
(i)

Find the inter-quartile range of the weights.

(ii)

Find any weights that are outliers.

Any outliers will be in regions:

x <

x >

So there is only one outlier, .

The median of the data is 37 kg.

Draw a box plot for the data.

(b)

Name a distribution that could be used to model the weight of these pigs.

distribution, as it is symmetric.

The length of reign of each of the last 19 English monarchs is given in the table.

(a)

Find the median and quartiles of the length of reign of these 19 monarchs.

Lower quartile:

Median:

Upper quartile:

(b)

Write down the name of any monarch whose length of reign is an outlier.

Any outliers will be in regions:

x <

x >

So the outliers are .

The box and whisker plot shows the length of reign of the last 19 Popes.

(c)

Compare the length of reign of monarchs and Popes.

Generally, appear to reign longer.