LECTURE 2, FRIDAY AUGUST 29, 2008

CHAPTER 2 DESCRIBING DISTRIBUTIONS WITH NUMBERS

We want to develop numerical methods to describe data sets. The numbers to be used depend on the SHAPE of the distribution:

SHAPE / CENTER / SPREAD
SYMMETRIC / MEAN / STANDARD DEVIATION
SKEWED / MEDIAN / MINIMUM, 1ST QUARTILE, 3RD QUARTILE, MAXIMUM

MEAN: Sum of all the data values / count of all the data values

Same as the arithmetic average

You will use your calculator in the statistical mode to calculate this.

MEDIANThe middle point (countwise) in the data which has been put in ascending order. STEP 1

Location in the ordered list = (n+1) /2 STEP 2

The data value at that position is the median of the data. STEP 3

Example when n is an odd number

Position of median is a whole number:

Example when n is an even number

Position of median is a number ending in .5, 20.5 for example when n=40

The median has as many values greater than it as it has smaller than it, ie, the median separates the data into equal halves (countwise), and the median does not belong to either half.

Comparing the mean and the median.

If the data set is exactly symmetric about center, the mean and the median will be equal.

If the data set is left skewed, the mean will be less than the median.

If the data set is right skewed the mean will be greater than the median.

The mean is affected by extreme values, either extremely high or extremely low.

The median is affected less by extreme values.

Example: Data 1,2,3,4,5 Mean = 3 Median = 3

Data 1,2,3,4,9 Mean = 3.8 Median = 3

In some cases the median is preferred over the mean because the median represents what is TYPICAL, whereas the mean might be inflated greatly by extremely high values. This might be the case with variables associated with money, such as household income or sale price of houses.

MEASURING SPREAD

Simplest approach: Data values vary FROM______TO ______

Better ways: SYMMETRIC distributions: STANDARD DEVIATION

SKEWED distributions: 5 NUMBER SUMMARY:

MINIMUM, 1ST QUARTILE, MEDIAN, 3RDQUARTILE, MAXIMUM

1ST QUARTILE: median of the lower half, so you repeat the procedure for finding the median except n is the number of values in the lower half.

3RD QUARTILE: median of the upper half so you repeat the procedure again.

BOX PLOT: A graphical presentation of the 5 number summary

Each portion of the box plot represents 25% of the data set, countwise.

1.5 IQR RULE FOR OUTLIERS

STANDARD DEVIATION: Generally used as a measure of spread when the data is symmetric about the center. One number can describe both sides of the distribution.

Standard deviation is based on the distance of each data value from the mean of the data.

Formula:

Your calculator will determine standard deviation for you.

The symbol for std dev depends on the model of calculator.

It is Sx in some TI calculators

in some calculators

Facts about standard deviation:

Std Dev, s, measures the spread from the mean and should only be used when the mean is used to describe the center. Std Dev would never be used with the median.

Std Dev, s, is never negative. The lowest possible value is 0 and there is no limit on the up side. The higher the Std Dev the greater the spread.

Std Dev, s, always has the same units as the measurements themselves.

Std Dev, s, is affected by extremely large or extremely small data values. Either one would inflate Std Dev.

Example: Data 1,2,3,4,5 Mean = 3 Median = 3 std dev = 1.5811

Data 1,2,3,4,9 Mean = 3.8 Median = 3 std dev = 2.7657

Data 1,2,3,4,20 Mean = 6 Median = 3 std dev = 7.9057