From the
Exploring Data website - http://curriculum.qed.qld.gov.au/kla/eda/
© Education Queensland, 1997
Advanced Stemplots
Advanced stemplots is really a contradiction - stemplots by their nature should be simple to construct. Nonetheless, there may be times when a stemplot is desired and constructing it involves a greater effort than usual.
An advanced stemplot includes one or more of these features:
| Split stems | One purpose of a stemplot is to display the shape of the distribution. To achieve a satisfactory display of some datasets, the stem is best split into two parts, eg. with one part containing unit values from 0 to 4 and the other part from 5 to 9. Other datasets may benefit if the stem is split into five parts: 0-1, 2-3, 4-5, 6-7, 8-9. |
| Truncated data values | For data with a large number of significant digits, it is common to decide how many digits are needed and then truncate the data. Not much is lost by doing this, as the essence of the original data is still retained. Data is truncated rather than rounded as it is easier to do. |
| Outliers | Imagine a dataset that contains an extreme outlier. It isn't sensible to extend the stem to include the outlier, which means including row after row of empty stems. Most computer-generated stemplots display the outlier as a data value outside of the stemplot proper, at the top or bottom of the stemplot as appropriate, and labelled as HIGH or LOW. It is a matter of judgement when to adopt this approach. |
| Scaling | If the values to be plotted are extremely large or extremely small the data has to be scaled, by multiplying or dividing by a power of 10. For example NCSS Jr. scales the data to remove decimal points. |
In 1798, Henry Cavendish measured the density of the earth using an instrument called a torsion balance. While the density of the earth is obviously not uniform, the value of the mean density is important in determining the earths composition. The units are grams / cm3.
| 5.5 | 5.57 | 5.42 | 5.61 | 5.53 | 5.47 | 4.88 | 5.62 | 5.63 | 4.07 | 5.29 | 5.34 |
| 5.26 | 5.44 | 5.46 | 5.55 | 5.34 | 5.3 | 5.36 | 5.79 | 5.75 | 5.29 | 5.1 | 5.86 |
| 5.58 | 5.27 | 5.85 | 5.65 | 5.39 |
Here is the NCSS 6.0 Jr stemplot of this display along with some comments.
Stem-Leaf
Plot Section of Density
Unit = .01 Example: 1 |2 Represents 0.12 |
Comments The Depth column records the cumulative number of data values, counting in from each end. The entry in brackets locates the row that contains the median, and gives the number of entries in that row. NCSS has chosen a two-digit stem with single digit leaves. The outlier is labelled as Low and the entire value (407, representing 4.07) is given in the 'Leaves' column. The scale is given at the
bottom. For this dataset NCSS multiplies each value by
100 to remove the decimal point. For example, 54 | 2
represents a value of 542. Multiply this by the unit
(.01) to return the original value: |
The Metric Dataset
Shortly after metric units were introduced in Australia, a group of 44 students was asked to guess, to the nearest metre, the width of the lecture hall in which they were sitting. The true width of the hall was 13.1 metres.
| 8 | 9 | 10 | 10 | 10 | 10 | 10 | 10 | 11 | 11 | 11 | 11 | 12 | 12 | 13 | 13 | 13 |
| 14 | 14 | 14 | 15 | 15 | 15 | 15 | 15 | 15 | 15 | 15 | 16 | 16 | 16 | 17 | 17 | |
| 17 | 17 | 18 | 18 | 20 | 22 | 25 | 27 | 35 | 38 | 40 |
Stem-Leaf
Plot Section of Guess
Unit = 1 Example: 1 |2 Represents 12 |
Comments To achieve the best display, NCSS
Jr has split the stems into five parts. The labels used
are as follows: The data exhibits two peaks, which are due to students choosing 10 and 15 more often than numbers near to those. It is a reflection of our number system and the rounding inherent in estimation. Since the original data are retained, the reason for the two peaks can be determined from the stemplot. There are four high outliers which are given in the 'High' row at the end of the stemplot. |
Researchers at the Purdue University School of Veterinary Medicine deposited 25 female and 10 male fleas in the fur of a cat, in order to study the egg production of the flea. The number of eggs produced by the fleas over 27 consecutive days is given below.
Source: Introduction to the Practice of Statistics, David Moore and George McCabe, p.27
Day |
No. of eggs |
Day |
No of eggs |
1 |
436 |
15 |
550 |
2 |
495 |
16 |
487 |
3 |
575 |
17 |
585 |
4 |
444 |
18 |
549 |
5 |
754 |
19 |
475 |
6 |
915 |
20 |
435 |
7 |
945 |
21 |
523 |
8 |
655 |
22 |
390 |
9 |
782 |
23 |
425 |
10 |
704 |
24 |
415 |
11 |
590 |
25 |
450 |
12 |
411 |
26 |
395 |
13 |
547 |
27 |
405 |
14 |
584 |
Stem-Leaf
Plot Section of No_Fleas
Unit = 10 Example: 1 |2 Represents 120 |
Comments The stemplot shows the data has two clusters. The last digit of the data was truncated. While some detail from the original data is lost, the stemplot is easier to read. This stemplot shows an alternative method of displaying split stems. Instead of using the symbols '*' and '.' , the leading digit was repeated. Both methods are common, though the '*' and '.' are traditional. The two high outliers are listed at the bottom of the stemplot to two significant figures. As the data was gathered over time a timeplot should also be constructed as this shows how the number of eggs changed over time. |