AP StatisticsChapter 1:Exploring Data Section 1.2: Describing Distributions with Numbers

Shape: x̅ < M, x̅ > M, x̅ ~ M

Center:

Mean

Median

Mode – (only a measure of center for categorical data, if 2 modes then it is Bimodal)

Spread:

Range

Interquartile Range (IQR, ALWAYS state even if not drawing a box plot)

Standard Deviation

Mean:

Is the arithmetic average.

Formula:

x = ∑xi

n

Not resistant to the influence of extreme observations.

Sample mean: x̅ Population mean: μ (mu)

Median - M

The midpoint of a distribution

The number such that half the observations are smaller and the other half are larger.

If the number of observations n is odd, the median is the center of the ordered list.

If the number of observations n is even, the median M is the mean of the two center observations in the ordered list.

Is resistant to the influence of extreme observations.

Parameters vs. Statistics:

Parameters are used to measure populations – greek letters

Statistics are used to measure summaries (samples) of data – English letters

How to load data into a list

How to find statistics about a list of data

For the following data create a back to back stem plot, describe the plot both visually and mathematically. Which representation of the center makes more sense?

Babe Ruth’s HR totals: 54, 59, 35, 41, 46, 25, 47, 60, 54, 46, 49, 46, 41, 34, 22

Roger Maris’ HR totals: 8, 16, 23, 14, 26, 33, 39, 95, 28, 13 (95 is not a real HR fyi)

Measuring Spread: The Quartiles

Why is spread so important? Create 2 dotplots of the following data. State the mean, median, and range.

1)3, 3, 3, 3, 3

2)2, 2, 3, 4, 4

Range

Difference between largest and smallest points.

Not resistant to the influence of extreme observations.

Interquartile Range (IQR)

Measures the spread of the middle half of the data.

Is resistant to the influence of extreme observations.

Quartile 3 minus Quartile 1.

Quartiles: 25th, 50th, 75th percentiles

Percentiles: Q1 and Q3 are the 25th and 75th percentiles respectively.

Q1 - 25th percentile - means 25% of data falls at or below this value

Q2- median - 50th percentile

Q3 - 75th percentile - means 75% of data falls at or below this value

95th percentile means 95% of data is at or below this value.

The five number summary

Consists of the

min, Q1, median, Q3, max

Offers a reasonably complete description of center and spread.

Used to create a boxplot.

Boxplot

Shows less detail than histograms or stem plots.

Best used for side-by-side comparison of more than one distribution.

Gives a good indication of symmetry or skewness of a distribution.

Regular boxplots conceal outliers.

Modified boxplots put outliers as isolated points.

For the following data create a side by side box plot, describe the plots, identify any possible outliers

Babe Ruth’s HR totals: 54, 59, 35, 41, 46, 25, 47, 60, 54, 46, 49, 46, 41, 34, 22

Roger Maris’ HR totals: 8, 16, 23, 14, 26, 33, 39, 95, 28, 13 (95 is not a real HR fyi)

Min Q1 M Q3 Max

Ruth:

Maris:

Can use the calculator to find the median and quartiles using

1-Var Stats (scroll down to second page of results)

Outlier

An observation that falls outside the overall pattern of the data

Calculated by using the IQR - interquartile range (Q3-Q1)

Anything smaller than Q1 – 1.5(IQR) or larger than Q3 + 1.5(IQR) is an outlier. These are thresholds.

Do NOT draw thresholds on box plot – they are not data values!

Why 1.5 x IQR? John Tukey, inventor of boxplots, answered that one was notenough and two was too much.

TI-83/84 can display up to three boxplots onthe same scale. Use StatPlot. Can selectquick or full boxplot. Identify list where datais located and frequency list if used. Use zoom 9 to draw graph.remember: clear out any equations in “y =“ first !!)

To draw “full” or “modified” boxplot (Modified Boxplot- modified to remove outliers)

Identify thresholds_—start with 5-number summary and IQR “Thresholds” above (or below) which any data values are considered outliers

Don’t draw thresholds on boxplot—they are not data values. Only use them toidentify outliers.

How to determine shape of distribution from a box plot

For Ruth and Maris create a modified side by side box plot, describe the plots, identify any outliers.

Homework: p. 74 27 – 32 p.82 33, 35 - 38

Measuring Spread:

Why might the Range not give a good picture of the spread of data?

Create two dot plots.

1)3, 3, 3, 3, 3, 9

2)2, 2, 3, 4, 6, 6,8

Variance (sample data - s2 population data – σ2)

The average of the squares of the deviations of the observations from their mean.

In symbols, the variance of n observations x1, x2, …, xn is

or

Standard deviation (sample data – s pop. data – σ (sigma))

The square root of variance.

Compares observations to their mean

Degrees of Freedom is n – 1.

How to find the mean and standard deviation from their definitions.

With the list of numbers, calculate the variance and standard deviation using the formula. 5, 6, 7, 8, 10, 12

  1. Find the mean ( x ):
  2. Plot the points on a number line:
  1. Find s² and s

Find the variance and standard deviation with these metabolic rates for 7 men who took part in a study using the calculator:

1792, 1666, 1362, 1614, 1460, 1867, 1439.

1. Find the mean ( x ):

2. Plot the points on a number line:

3. Find s² and s

Properties of Variance:

Uses squared deviations from the mean because the sum of all the deviations not squared is always zero.

Has square units.

Found by taking an average but dividing by n-1.

The sum of the deviations is always zero, so the last deviation can be found once the other n-1 deviations are known.

Means only n-1 of the squared deviations can vary freely, so the average is found by dividing by n-1.

n-1 is called the degrees of freedom.

Properties of Standard Deviation

1)s measures spread about the mean; only use when using mean as center.

2)s =0 when all observations are equal.

3)s is not resistant

How to choose between 5 number summary and s/x̅ to describe center and spread? :

1)Use 5 number summary when skewed or strong outliers.

2)Use s/ x̅ when reasonably symmetric.

Summary:

Always plot your data!!!!! Always show title and labels!!!

Graphs

Give the best overall picture of a distribution.

Numerical measures of center and spread

Only give specific facts about a distribution.

Do not describe its entire shape.

Can give a misleading picture of a distribution or the comparison of two or more distributions.

Homework p. 89 39 - 43

Linear Transformations

Why would we want to transform the data??

When univariate data span several orders of magnitude, you have a problem with the large or small values “dominating” the distribution.

How would you create a stem plot of the following data?

3066, 34, 39769, 1396, 902, 307, 7, 10, 540

How do you think the stem plot would look? Useful??

The goal is to change skewed univariate data to be approximately normal and bivariate data to be more linear across a scatterplot.

Linear Transformations

Changes the original variable x into the new variable xnew (x*).

x* = a + bx

Does not change the shape of a distribution.

Can change one or both the center and spread.

The effects of the changes follow a simple pattern.

Adding the constant (a) shifts all values of x upward or downward by the same amount.

Adds (a) to the measures of center and to the quartiles but does not change measures of spread.

Multiplying by the positive constant (b) changes the size of the unit of measurement.

Multiplies both the measures of center (mean and median) and the measures of spread (standard deviation and IQR) by (b).

Example: you have 5 observations; 3, 5, 10, 12, 15

x̅ / M / S
1)For the obs
2)Obs + 10
3)Obs *6
4)Obs * 6 + 10

For the transformation x*= a+bx:

1)All measures are multiplied by b

2)a is added to x̅ and 5 number summary

(Measures of spread are unaffected by a)

X*= a+bx M*= a+bm Q1*= a+bQ1

S*= bs min*= a+bmin Q3*= a+bQ3

max*= a+bmax IQR*= bIQR

Chapter review

D) Items used for describing plots:

D) Items used for describing plots: