From the Exploring Data website - http://curriculum.qed.qld.gov.au/kla/eda/
© Education Queensland, 1997

Advanced Stemplots

‘Advanced stemplots’ is really a contradiction - stemplots by their nature should be simple to construct. Nonetheless, there may be times when a stemplot is desired and constructing it involves a greater effort than usual.

An advanced stemplot includes one or more of these features:

Split stems One purpose of a stemplot is to display the shape of the distribution. To achieve a satisfactory display of some datasets, the stem is best split into two parts, eg. with one part containing unit values from 0 to 4 and the other part from 5 to 9. Other datasets may benefit if the stem is split into five parts: 0-1, 2-3, 4-5, 6-7, 8-9.
Truncated data values For data with a large number of significant digits, it is common to decide how many digits are needed and then truncate the data. Not much is lost by doing this, as the essence of the original data is still retained. Data is truncated rather than rounded as it is easier to do.
Outliers Imagine a dataset that contains an extreme outlier. It isn't sensible to extend the stem to include the outlier, which means including row after row of empty stems. Most computer-generated stemplots display the outlier as a data value outside of the stemplot proper, at the top or bottom of the stemplot as appropriate, and labelled as HIGH or LOW. It is a matter of judgement when to adopt this approach.
Scaling If the values to be plotted are extremely large or extremely small the data has to be scaled, by multiplying or dividing by a power of 10. For example NCSS Jr. scales the data to remove decimal points.

Three Examples

The Density of the Earth Dataset

In 1798, Henry Cavendish measured the density of the earth using an instrument called a torsion balance. While the density of the earth is obviously not uniform, the value of the mean density is important in determining the earth’s composition. The units are grams / cm3.

Density Measurements

5.5 5.57 5.42 5.61 5.53 5.47 4.88 5.62 5.63 4.07 5.29 5.34
5.26 5.44 5.46 5.55 5.34 5.3 5.36 5.79 5.75 5.29 5.1 5.86
5.58 5.27 5.85 5.65 5.39              

Here is the NCSS 6.0 Jr stemplot of this display along with some comments.

Stem-Leaf Plot Section of Density
Depth Stem | Leaves
  Low | 407
2 48 | 8
2 49 |  
2 50 |  
3 51 | 0
7 52 | 6 7 9 9
12 53 | 0 4 4 6 9
(4) 54 | 2 4 6 7
13 55 | 0 3 5 7 8
8 56 | 1 2 3 5
4 57 | 5 9
2 58 | 5 6

Unit = .01 Example: 1 |2 Represents 0.12

Comments

The Depth column records the cumulative number of data values, counting in from each end. The entry in brackets locates the row that contains the median, and gives the number of entries in that row.

NCSS has chosen a two-digit stem with single digit leaves.

The outlier is labelled as ‘Low’ and the entire value (407, representing 4.07) is given in the 'Leaves' column.

The scale is given at the bottom. For this dataset NCSS multiplies each value by 100 to remove the decimal point. For example, 54 | 2 represents a value of 542. Multiply this by the unit (.01) to return the original value:
542 x .01 = 5.42


The Metric Dataset

Shortly after metric units were introduced in Australia, a group of 44 students was asked to guess, to the nearest metre, the width of the lecture hall in which they were sitting. The true width of the hall was 13.1 metres.

Guesses (Metres)

8 9 10 10 10 10 10 10 11 11 11 11 12 12 13 13 13
14 14 14 15 15 15 15 15 15 15 15 16 16 16 17 17  
17 17 18 18 20 22 25 27 35 38 40            

 

Stem-Leaf Plot Section of Guess
Depth Stem   Leaves
2 . | 8 9
12 1* | 0 0 0 0 0 0 1 1 1 1
17 T | 2 2 3 3 3
(11) F | 4 4 4 5 5 5 5 5 5 5 5
16 S | 6 6 6 7 7 7 7
9 . | 8 8
7 2* | 0
6 T | 2
5 F | 5
  High | 27, 35, 38, 40

Unit = 1 Example: 1 |2 Represents 12

Comments

To achieve the best display, NCSS Jr has split the stems into five parts. The labels used are as follows:
'*’ represents 0-1
‘T’ represents 2-3
‘F’ represents 4-5
‘S’ represents 6-7, and
‘.’ represents 8-9.
This is a common method of splitting stems.

The data exhibits two peaks, which are due to students choosing 10 and 15 more often than numbers near to those. It is a reflection of our number system and the rounding inherent in estimation. Since the original data are retained, the reason for the two peaks can be determined from the stemplot.

There are four high outliers which are given in the 'High' row at the end of the stemplot.


The Fleas Dataset

Researchers at the Purdue University School of Veterinary Medicine deposited 25 female and 10 male fleas in the fur of a cat, in order to study the egg production of the flea. The number of eggs produced by the fleas over 27 consecutive days is given below.

Source: Introduction to the Practice of Statistics, David Moore and George McCabe, p.27

Day

No. of eggs

Day

No of eggs

1

436

15

550

2

495

16

487

3

575

17

585

4

444

18

549

5

754

19

475

6

915

20

435

7

945

21

523

8

655

22

390

9

782

23

425

10

704

24

415

11

590

25

450

12

411

26

395

13

547

27

405

14

584

   

 

Stem-Leaf Plot Section of No_Fleas
Depth Stem   Leaves
2 3 | 9 9
9 4 | 0 1 1 2 3 3 4
13 4 | 5 7 8 9
(3) 5 | 2 4 4
11 5 | 5 7 8 9 9
6 6 |  
6 6 | 5
5 7 | 0
4 7 | 5 8
  High | 91, 94

Unit = 10 Example: 1 |2 Represents 120

Comments

The stemplot shows the data has two clusters.

The last digit of the data was truncated. While some detail from the original data is lost, the stemplot is easier to read.

This stemplot shows an alternative method of displaying split stems. Instead of using the symbols '*' and '.' , the leading digit was repeated. Both methods are common, though the '*' and '.' are traditional.

The two high outliers are listed at the bottom of the stemplot to two significant figures.

As the data was gathered over time a timeplot should also be constructed as this shows how the number of eggs changed over time.