Comparing Data Distributions
Question:
How can statistics from data distributions be used to compare sets of data?
Launch
Ninth grade students in an English class were surveyed to find out about how many times during the last year they saw a movie in a theater. The results are listed below.
0, 0, 16, 20, 1, 17, 16, 1, 3, 1, 17, 19, 1, 2, 12, 15, 4, 1
- Find the median of the data distribution.
2. What is the range of the distribution?
- Find the first and third quartiles (upper and lower quartiles).
- Construct and label a box-and-whiskers plot using the scale shown.
- What is the interquartile range (IQR) for this data distribution?
- What does the interquartile range (IQR) indicate about a distribution?
- What is the mean of the data distribution?
- What is the mode of the data distribution?
- Would the mean, the median, or the mode be the most appropriate measure to represent how many times the “average” student in this class saw a movie in a theater during the past year? Why?
Investigation:
From the GeorgiaSchool Newspaper:
Twin Sisters, Ashley and Mary Kate, Are Teachers!
“Twins Ashley and Mary Kate Colson, age 30, are teachers in the same school district. Although Ashley teaches kindergarten and Mary Kate teaches 9th grade mathematics they frequently collaborate on ideas for teaching their students as well as studying differences in the way their students understand mathematical concepts. Recently they decided to investigate how their students’ perceptions of adult ages compare by asking them the question “How old do you think your teacher is?”
Ashley and Mary Kate both asked a sample of their students to guess their age with the following results:
Ashley (Kindergarten teacher) Mary Kate (9th grade Math Teacher)
32 26
28 36
12 31
70 28
3128
4128
1334
2833
3730
1026
30
28
- Find the mean, the median, and the mode of each data distribution. Compare the two groups based on their measures of central tendency.
- Based on your findings, do you think that both groups are equally as accurate at guessing the age of their teacher? Why or why not?
- What do you notice about the variability of “age guesses” between the two data distributions?
- For each group determine the following:
Kindergarten Teacher / 9th grade Math Teacher
Range
Q1
Median
Q3
InterquartileRange
Maximum Value
Minimum Value
- Constructand label Box-and-Whiskers plots(one on top of the other) for both data distributions using the scale shown.
- Compare the range and interquartile range for each group to determine the variability within each group. Comment on what you notice.
- Are there any outliers in either of the two groups? Show how you know.
- How does an outlier affect the mean of a distribution? How does an outlier affect the median of a distribution?
- Summarize your findings to determine which group is better at guessing the age of their teacher. Explain how you know.
Mean Absolute Deviation
Comparing measures of central tendency between groups can sometimes lead one to believe that groups are similar. Comparing variability or spread of data between groups can often yield differences that at first were not apparent. It is important to look at the variability of data distributions when comparing differences and similarities between and among groups.
Average Deviation From the Mean
- Determine the difference between each piece of data and the mean of its distribution. What is the average deviation from the mean for each group? Explain your results? Will these results always occur? Why or why not?
Ashley(K teacher) / X ─ / Mary Kate (9th Grade Teacher / X ─
32 / 26
28 / 36
12 / 31
70 / 28
31 / 28
41 / 28
13 / 34
28 / 33
37 / 30
10 / 26
30
28
Sum / Sum
Mean / Mean
One way of comparing variability between groups is to look at the mean absolute deviation (MAD). The mean absolute deviation is the arithmetic average of the absolute values of the difference between each value and the mean of a distribution. The larger the value of the MAD the more spread out the values are from the mean. When comparing variability of data distributions using the MAD, a distribution with a larger MAD would have more erratic values, while a distribution with a smaller MAD would have more consistent values.
- Explain why taking the absolute value of the differences changes the average variability of the groups?
- Determine the Mean Absolute Deviation (MAD) for each data set.
Ashley(K teacher) / X ─ / │ X ─ │ / Mary Kate (9th Grade Teacher / X ─ / │ X ─ │
32 / 26
28 / 36
12 / 31
70 / 28
31 / 28
41 / 28
13 / 34
28 / 33
37 / 30
10 / 26
30
28
Sum / Sum
Mean / Mean
- Based on this Investigation write a summary and statistical assessment for Ashley and Mary Kate’s article that summarizes the results about students of different ages estimating their teacher’s age. Which group is more consistent in guessing their teacher’s age? Include statistics that verify/confirm your conclusions.
Conclusions:
Statistics that compare measures of central tendency between different sets of data sets include ______, ______, and ______.
Statistics that compare measures of variability or dispersion between different sets of data include ______, ______, and ______.
The mean absolute deviation is the mean of ______
______.
In Class Problems:
- The box plots shown represent pulse rates per minute for random samples of 100 people in each of four age groups.
Newborn babies
6-year olds
15-year olds
35-year olds
Pulse Rates per Minute
Complete the chart below for each group:
Newborns / 6-yr olds / 15-yr olds / 35-yr oldsRange
Q1
Median
Q3
InterquartileRange
Maximum Value
Minimum Value
Compare the results and summarize your findings. What do these plots indicate about pulse rates as people get older?
- If two data sets have the same interquartile range do they also have the same median? Give a specific example of values for two data sets that illustrate your conclusion.
- When can the median of a data set be a better measure of central tendency than the mean? Give a specific example.
4. Class Experiment: Estimating the length of a piece of string.
Divide the class into two groups (for example, boys/girls). The teacher holds up a piece of string in front of the class (between 10 and 30 inches long). Students estimate the length of the string in inches and in centimeters. Record all individual estimates for each group on the “Data Collection Sheet.”
For each data distribution:
- Determine measures of central tendency (mean, median, and mode)
- Determine measures of variability (range, interquartile range, and mean absolute deviation).
- Construct box-and-whisker plots for each data distribution.
When completed, each group presents a summary of their results to the entire class for comparisons. Students record statistics from each group on the “Comparing Data Distributions” sheet.
The teacher then measures the piece of string in inches and in centimeters to determine the actual length.
How did the groups compare in their ability to estimate the length of the string?
Closure:
How can statistics be used to compare sets of data?
Homework:
Write an analysis of the results of the “Estimating the length of a piece of string” experiment. Compare the two groups’ abilities to estimate the length of the piece of string in inches and in centimeters. Justify your comparisons by explaining how you know.
- How do the statistics compare between the two groups?
- Which group was better at estimating in inches? In centimeters?
- Which group was more consistent in estimating the length of string?
- Which group shows more variability in their estimates?
- Are students more accurate when estimating in inches or in centimeters?
Data Collection Sheet
Experiment: Estimating the length of a piece of string.
Name / Estimate (Inches) / Estimate(Centimeters
Mean______Mode______
Median______Q1______
Range______Q3______
InterquartileRange______
Mean Absolute Value______
6. Draw and label a Box-and-Whiskers plot foreach data set using the appropriate scale shown.
Comparing Data Distributions
Summary Data
Group______ / Group______Inches / Centimeters / Inches / Centimeters
Mean
Mode
Median
Range
Maximum
Minimum
Q1
Q3
InterquartileRange
Mean Absolute Value
7. Draw Box-and-Whisker plotsto compare results from both Groups using the appropriate scales shown. On the “Inches” scale draw plots for each group. On the “Centimeter” scale draw plots for each group.
1
2/26/07