Activity Planning Worksheet

Group # 2

Wib Leonard

Nick Pajewski

Megan Marchini

Activity Planning Worksheet

Learning objectives
Explain how outlying observations effect numeric summary measures of central tendency and dispersion
Explain why rank-based statistics are more robust to outlying observations
Recognize outlying observations graphically through boxplots, etc.

Context
Graphically represent distributions (histograms, boxplot, etc.)
Calculate summary measures of central tendency and dispersion (mean, median, mode, standard deviation, Inter-quartile range)

Mechanics
Activity assumes each student will have a calculator
Worksheet will be given to each group in order to guide data collection
Break students into groups of 5 (assuming a class of roughly 30 students)
Have students collect and record their ages as well as the ages of any siblings on attached worksheet
Have students also record the age of the oldest grandparent in the group
Have students compute summary measures for each of the two datasets
(1) w/o grandparent (2) w/ grandparent
Have students put their computed statistics on board in order to compare results across the class
This comparison should allow illustrating how the effect of outliers diminishes as the sample size increases
May need to combine groups to illustrate this point
If necessary, computer-based presentation to formalize concepts
Consider using a dataset like the Shark dataset (Agresti page 45) and setting up an Excel spreadsheet like in the file shark_example .xls
Or, for a more visual presentation, an applet like the one at …. Can be used.
Variety
Basic calculations of summary measures
Thinking about distributional shapes
Thinking about how outliers adjust summary statistics

Summary
When using only the ages of your group and its siblings, the mean, median, and mode should be similar. However, when adding the age of the oldest grandparent, we would expect the mean to be greater than the median.
Summary measures like the mean and sample standard deviation are more sensitive to outlying observations than rank-based measures like the median and Inter-quartile range.
The effect of outliers diminishes as the sample size increases.

Follow-up
Students will be asked question on topic during next quiz / exam
A follow-up discussion should center on what to do with outlying observations ( exclude from analysis, provide separate analyses, etc. )
As a long term extension, this activity ties into hypothesis testing in two fashions
First, it can illustrate the effect of outliers on parametric tests, like the two-sample t-test, where population means are compared
Second, it provides a motivation for using rank-based non-parametric tests in the situation where outliers cause violations in model assumptions
This activity also introduces the concept of outliers for use in the linear regression setting (Residual analysis, influential observations, etc.)
This can also be tied into a discussion on data transformations, such as using a log transformation, to lessen an outlier’s effect