From the Exploring Data website - http://curriculum.qed.qld.gov.au/kla/eda/
© Education Queensland, 1997

Student Generated Data

Introduction

When studying statistics, students must work with real data. After all, if the data is made up who cares if there are patterns or anomalies in the data, for we can just change the data and remove them. Who would bother trying to decipher what the data says, if the data has no message?

So where do students find real data? Broadly speaking there are two sources of real data - data that is gathered by others, and data that is generated by students. The Datasets page of this website contains an ample supply of data gathered by others. Working with these datasets is important, as it allows students to see how statistics is used to help answer important questions.

But we will be doing our students' statistics education a disservice if we don't allow them to experience the often messy process of gathering their own data. Sometimes it is easy - finding out the distribution of ages in a year 11 Maths A class (yawn) for example. But more often, gathering good data is not easy. Selecting a truly random sample may be prohibitively expensive or time-consuming, taking reliable measurements may not be easy or the topic may incite passions, and hence bias, on the part of the researchers. To get some appreciation of these issues requires that students carry out the data-gathering process themselves, and reflect on the process afterwards.

There is a spin-off, fortunately. If the students have input into the problem to be solved, and the problem has some inherent interest, or if the data-gathering exercise is enjoyable, then the students may gain some degree of ownership of the problem, and have a greater degree of interest in the outcome.

Recycle Your Datasets

Gathering data uses up precious class time, so it makes sense to re-use the data for a variety of purposes. For example, if you gather data on student height and reach to make a scatterplot, the data can also be used for the study of graphical displays and summary statistics. If the data is categorised by sex then a t-test may be able to be used to determine if the differences due to gender are significant. Re-cycling makes just as much sense with datasets as it does with newspapers.

Univariate Data - Some Ideas for Gathering Data

Greed is a game described in a article found in the Stemplots page of the website. Students find it to be an enjoyable way of gathering data which can then be displayed as a stemplot, dotplot, histogram or boxplot. If the game is played with a single class the dataset is small enough to enter into a graphics calculator by hand.

If all desks in the classroom are similar, students can measure the length of the desk as accurately as possible, with the results collected by the teacher at the board. The variation in the answers may surprise some students, and a histogram or a stemplot will reveal the shape of the distribution of measurements. Hopefully it will be somewhat normally distributed, with an outlier or two because of an error in measuring. This would provide a background for meaningful discussion about measurement errors.

Photocopy the hand of the principal and ask students to take various measurements. Some measurements, such as the length of the index finger, are relatively straightforward but others, such as the length of the thumb, are not, as there is no obvious place where the thumb stops and the palm starts. This should give rise to a discussion about how reliability and validity of measurements can be obtained.

Students can repeatedly roll a ball down an incline onto some carpet and measure how far the ball rolls. Theoretically it should give a nice normal distribution. Working in groups, students in the class can use a beam balance to measure the weight of a full can of coke, and a graphical display of the set of measurements can be produced. Students can count the number of beans in a cup of beans. And so on.

Bivariate Data and Regression

It makes sense to divide the data into those datasets which can be modelled with a linear function and those that are best modelled with a non-linear model, as these topics are covered at different times in Maths B.

Linear Data

Here are some ideas on generating linear datasets from Alice Hankla, Galloway School, Atlanta Georgia, USA.

And a few more ideas

Measure the height and the reach of students. Is it better to express the reach as a length or as a percentage of the height? Is there a significant difference between males and females (or short and tall people) if we express the difference in reach and height in percentage terms rather than as a absolute figure?

If you did the ‘principal’s palm’ activity earlier, you can now take any two sets of measurements (eg the length of the index finger and the length of the thumb) and plot them on a scatterplot. If you want a non-linear relation then plot index finger length vs the area of the palm. It should give rise to a quadratic function.

Nonlinear Data

Here are some data-gathering ideas from Alice Hankla, Galloway School, Atlanta, Georgia, USA.

Nonlinear and Mostly Physics

From Al Coon’s Class

Here are some ideas from Al Coon’s 1996 class of AP Statistics students. Some of these ideas may give rise to linear relationships and others give rise to nonlinear relationships.

From Peter Gotley, All Saints School in Queensland Australia

Here are a few other non-linear modelling applications from Peter Gotley, All Saint’s School, Queensland, Australia.

And more ideas

Hypothesis Testing and Confidence Intervals

The Truth is Staring Us in the Back

Here is a neat story that I found in the Chance Database. The article was written by Robert Matthews and published in the November 2, 1997 edition of the Sunday Telegraph.

Matthews reports experiments by biologist Rupert Sheldrake, designed to show that it is possible for people to tell when someone is staring at them from behind. Like ESP, this is something that most people believe they have experienced but which sceptics will challenge. Matthews reports that Sheldrake has carried out a simple experiment with a large number of pairs of children. In a single experiment, say with Jim and Mary, Jim is blindfolded with his back to Mary. Mary then decides, by referring to a list of random 0's and 1's, whether to stare or not stare at Jim. Jim then records whether or not he thinks he is being stared at. This is repeated for a sequence of, say, 20 trials. Matthews reports that, in a total of more than 18,000 trials carried out worldwide, the children reported they are being stared at in 60% of the trials when they were being stared at and in 50% when they were not being stared at. To avoid claims of communication, Sheldrake carried out similar experiments when there is a window between the two children.

On his ‘The Sense of Being Stared At’ page, Sheldrake describes in detail his experiment and suggests that you try it out with children or students. It is simple enough to make an interesting activity for a class and, if it is as successful as Sheldrake predicts it will be, this will be quite an exciting activity.

There are a number of alternative methods to gathering the data discussed on the website, so I suggest you visit if you are interested in this. (Disclaimer: I am a sceptic, and especially after I read on another page that Sheldrake collects anecdotal stories on physic pets.)

Ruler Drop - A Two-Sample Sign Test

Problem: Is there a difference in the reflex times of boys and girls in year 12?

We will gather data on this through an experiment called the Ruler Drop. One method of doing the Ruler Drop is given below, but the students should determine for themselves how the experiment should be designed to ensure reliability and validity of results.

Form groups of three - one girl, one boy, and one of either sex (a group of two is OK). Experimenter (E) holds a ruler by its far end. Subject (S) holds hand in a relaxed position centred at 0 cm mark. E drops ruler. S grabs it. Recorder (R) records the centimetre mark just above the thumb. The smaller the number, the faster the reflexes. Repeat 20 times.

Before you begin, agree as a class on the conditions of the experiment so the results are fair and unbiased. For example. E shouldn't try to 'trick' S by pretending to drop the ruler first. For each subject, do the experiment a few times without recording the result to establish a fair procedure.

Repeat for all three students in the group.

You should now have data on all Maths A students in year 12. Either the sign test or a two-sample t-test could be used to determine if there is a significant difference between boys and girls. As always the students should discuss the method of collecting the data

Tossing Drawing Pins - One-Sample Sign Test

In this activity we are comparing a population parameter against a given value. In particular we are comparing the probability that a drawing pin lands Point Up to the value p = 0.5.

Problem: Is it equally likely that a drawing pin dropped on the table will land point up or point down?

Drop a drawing pin on your desk. Record whether it lands ‘Point Up’ or ‘Point Down’. Repeat this 20 times.

Use the sign test to determine if the results are consistent with the hypothesis that ‘the probability that the drawing pin will land point up is one-half.’

Estimating Time- Paired-Sample Sign Test

In this activity we compare values that are naturally paired with each other, rather than randomly assigned.

Problem: Can a person better estimate the length of a time interval after practice?

Discuss the procedures necessary to gather reliable, unbiased data. Then perform the experiment. Formulate a null hypothesis and alternative hypothesis and conduct the hypothesis test.

It would be very instructive to also treat this as a Two-Sample test, and see if we arrive at the same conclusion.

Other Topics for Hypothesis Testing Using the Sign Test

These are only suggestions - Your students should be able to come up with many others.