Group # 2
Wib Leonard
Nick Pajewski
Megan Marchini
Activity Planning Worksheet
- Learning objectives
- Explain how outlying observations effect numeric summary measures of central tendency and dispersion
- Explain why rank-based statistics are more robust to outlying observations
- Recognize outlying observations graphically through boxplots, etc.
- Context
- Graphically represent distributions (histograms, boxplot, etc.)
- Calculate summary measures of central tendency and dispersion (mean, median, mode, standard deviation, Inter-quartile range)
- Mechanics
- Activity assumes each student will have a calculator
- Worksheet will be given to each group in order to guide data collection
- Break students into groups of 5 (assuming a class of roughly 30 students)
- Have students collect and record their ages as well as the ages of any siblings on attached worksheet
- Have students also record the age of the oldest grandparent in the group
- Have students compute summary measures for each of the two datasets
- (1) w/o grandparent (2) w/ grandparent
- Have students put their computed statistics on board in order to compare results across the class
- This comparison should allow illustrating how the effect of outliers diminishes as the sample size increases
- May need to combine groups to illustrate this point
- If necessary, computer-based presentation to formalize concepts
- Consider using a dataset like the Shark dataset (Agresti page 45) and setting up an Excel spreadsheet like in the file shark_example .xls
- Or, for a more visual presentation, an applet like the one at …. Can be used.
- Variety
- Basic calculations of summary measures
- Thinking about distributional shapes
- Thinking about how outliers adjust summary statistics
- Summary
- When using only the ages of your group and its siblings, the mean, median, and mode should be similar. However, when adding the age of the oldest grandparent, we would expect the mean to be greater than the median.
- Summary measures like the mean and sample standard deviation are more sensitive to outlying observations than rank-based measures like the median and Inter-quartile range.
- The effect of outliers diminishes as the sample size increases.
- Follow-up
- Students will be asked question on topic during next quiz / exam
- A follow-up discussion should center on what to do with outlying observations ( exclude from analysis, provide separate analyses, etc. )
- As a long term extension, this activity ties into hypothesis testing in two fashions
- First, it can illustrate the effect of outliers on parametric tests, like the two-sample t-test, where population means are compared
- Second, it provides a motivation for using rank-based non-parametric tests in the situation where outliers cause violations in model assumptions
- This activity also introduces the concept of outliers for use in the linear regression setting (Residual analysis, influential observations, etc.)
- This can also be tied into a discussion on data transformations, such as using a log transformation, to lessen an outlier’s effect