On the
Side

 

Histograms

Introduction

The text on this page is available as a web page, suitable for copying and pasting.

Activities

STEPS
Tutorials

Interactive computer-based tutorials

Worksheets

Histogram Worksheet

This worksheet recommends you re-cycle existing datasets and stories.

Matching Histograms and Boxplots

Matching Histograms and Summary Statistics

Datasets
Visit the Datasets page for more datasets and stories to support this topic.

Or, get some ideas on how students can gather their own data

FYI

How Wide Is Your Bin?

An article on how computers choose their default bin widths.

The Density Trace

A density trace is an alternative display to a histogram.

Old Unfaithful

A histogram may not display a dataset accurately. See for yourself.

  As a teacher of junior maths and Maths in Society, I used to think that a histogram was a rather trivial statistical object, sort of a bar graph with the gaps removed to save space. I never realised that statisticians actually find histograms to be useful!

A modern data-centred approach to statistics starts with viewing the data in a variety of ways. What is meant by viewing the data? Features of interest to a statistician are the overall shape of the data, symmetry, the location and the spread, existence of outliers and evidence of clusters or gaps. A histogram with a scale on the horizontal axis is generally useful for showing all of these features, though for some distributions the features of a dataset can be disguised or distorted due to a particular choice of bin width.

One application of the humble histogram is determining if a set of data is approximately normally distributed, though a histogram is most effective with for this purpose if the dataset is large. Normality is a pre-condition for certain analyses of data, including many hypothesis tests. While there are formal tests of normality, often a quick look at a histogram of the data is sufficient. And no statistican would rely strictly on formal tests without viewing the data also.

STEPS

The STEPS modules are a collection of hypertext-based tutorials covering a wide range of statistics topics, including the graphical display of data. Visit the STEPS page for further information and a list of the modules available.

Histograms and Stemplots Compared

A histogram shows much the same information as a stemplot, though for a given dataset one or the other of these methods of displaying the data may be preferable. Some points to note:

  • Histograms are preferable for larger datasets as stemplots become unwieldy;
  • With histograms, the original data are usually lost;
  • The choice of bin size or number of bins is not restricted, unlike the stemplot;
  • Histograms take more time than a stemplot to construct by hand; therefore stemplots are preferable for a small dataset.

Matching Histograms and Boxplots
Matching Histograms and Summary Statistics

Students will improve their ability to interpret the information given in a boxplot by matching boxplots of sample data drawn from different distributions with their associated histograms.

Students will improve their ability to visualise the shape of a distribution given the summary statistics.

Bin Width

Statistics computer programs and graphical calculators will generate a default histogram if bin width or the number of bins is not specified. It is interesting that there is no clear winner in the choice of algorithm used for choosing the number of bins or the bin width. The article How Wide Is Your Bin? contains an interesting thread (i.e. a discussion topic) from the Ed-Stats mailing list.

The Density Trace

With the widespread use of computers in modern statistics, new methods of displaying a dataset have been invented. NCSS 6.0 Jr allows the user to add a display called a density trace to a histogram. The density trace can be thought of as a smoothed histogram in which the problems caused by fixed bin widths are obviated. It is displayed as the curved line in the diagram. The article (which is the NCSS Jr help file on this topic) The Density Trace discusses this plot further.

The Histogram and Stemplot Compared

A histogram is an alternative to a stemplot for displaying data. A stemplot is restricted by our number system to certain bin widths; a histogram is under no such restriction. However, you usually lose the actual data values, and constructing a histogram by hand is a tedious process.

When constructing a histogram by hand, a decision about bin sizes and the number of bins has to be made when tabulating the data. A poor decision can result in a histogram that either gives misleading information about the data or fails to inform the viewer about some aspect of the data. A computer is of value here, as a variety of histograms, each with a different bin width, can be constructed. Which histogram is preferred depends upon which aspects of a dataset are to be featured.

Beware the Humble Histogram!

Ideally a histogram should show the shape of the distribution of the data. For some datasets, the choice of bin width can have a profound effect on how the histogram displays the data. To see this for yourself, have a look at the Histogram Applet, from R. Webster West, Dept. of Statistics, Univ. of South Carolina (you will need a java-enabled browser to see the applet). It is a histogram of the interruption time (i.e. time between eruptions) of the Old Faithful Geyser in Wyoming, USA. Slide the bar to change bin widths, and watch how that effects the shape of the histogram. Will you ever trust a histogram again?

As most classrooms don’t have Internet access on tap, the Word document Old Unfaithful contains a series of histograms of the interruption time of the Old Faithful geyser. The series nicely shows the effect of bin size on the appearance of the histogram.

| Read Me First! | Introduction | Acknowledgements |
|
Looking for Patterns |Stemplots | Dotplots | Histograms |
| Measures of Location | Measures of Spread |
| Boxplots | Normal Plots | Scatterplots |

| Assessment | Datasets | Resources |
| VISITOR'S BOOK | SEARCH | HOME |

| Linear Regression | Normal Distribution |
| Probability | Sampling | Confidence Intervals |
|
Hypothesis Testing | Non Linear Regression |