Ch2.2, Ch2.3Numerical Summary Measures of Variability for Data

Topics:

  • Measures of variability (spread):
  • Deviation
  • Variance/standard deviation
  • Interquartile range
  • Statistical definition of Outliers: the 1.5 x IQR criterion for outliers

------

II: Measures of Variability (Spread)

(1) Deviation: the difference between the observation and the mean

Deviation of is defined as .

What is the mean deviation () ?

So does not measure the spread in the data. We might use the average of absolute deviation:

.

But this measure is mathematically inconvenient. Then we consider the following definition.

(2) Variance / Standard Deviation

Sample Variance: the sample variance of n observations is average (using n-1) of squared deviations

► An alternative form convenient for calculation

  • Sample Standard Deviation (SD): the sample standard deviation s is the square root of :

Ex. Grades of 9 students on a HW assignment: 86,85,81,82,84,84,83,84,87. . What is the SD? (Also know that

= 63532 – 9 * = 28/8=3.5

  • Interpretation: A random student’s score is about 1.87 points from the mean score (84).

Comments:

  1. SD should be used as a measure of spread only when mean is used as the measure of center
  2. SD=0 implies all data points are the same (no variability)
  3. Like mean, SD is strongly influenced by outliers

Ex. If a coding error makes 87 to be 870, then =182 and SD becomes 278…

(3) InterquartileRange (IQR)

(a) Quartiles:Q1 = first quartile = median of the lower half of the data. Q3 = third quartile = median of the upper half of the data (Q2 = median).

IQR = Q3 – Q1 (good spread measure in the presence of outliers)

To find quartiles:

  1. Sort the data and divide data points into 2 halves (If there are odd number of observations, include the median in each half.)
  2. Lower quartile Q1= median of the lower half of the data.
  3. Upper quartile Q3 = median of the upper half of the data.

(b) Inter-QuartileRange (IQR)

  • IQR = Q3 – Q1
  • Interpretation of IQR: Measure of variability (spread) of data, similar to s but usually larger than s.

Ex. Grades of 9 students on a HW assignment: 86,85,81,82,84,84,83,84,87.

The sorted hw scores are:

81 82 83 84 84 85 86 87. So median = 84, Q1 = 83, Q3 = 85, IQR = 85 – 83 = 2.

Ex. Rainfall in NC in the some 15 months

1 | 0

2 | 25

3 | 45Stem: one digit

4 | 11667Leaf: tenths digit

5 | 449

6 | 0

7 |

8 | 2

Find quartiles and the IQR

Remark 1: The 1.5 x IQR Criterion for Outliers

An observation is called an outlier if it is 1.5*IQR larger than Q3 or 1.5*IQR smaller than Q1.

Extreme outliermay indicatedata entry error or unusual characteristics in the data that need careful investigation (if it is 3*IQR larger than Q3 or 3*IQR smaller than Q1.

Ex. Use the 1.5xIQR rule to check if there is any outlier in the Rainfall dataset.

Q1 = 3.45, Q3 = 5.4, IQR = 5.4 – 3.45 = 2.925.

Q1 – 2.925 = 3.45 – 2.925 = 0.525 (no data point smaller than 0.525)

Q3 + 2.925 = 5.4 + 2.925 = 8.325 (no data point larger than 8.325)

Remark 2: Standard numerical summaries of a data set includes sample size, center,and spread.

For reasonably symmetric distribution with no outliers, use

For the rest situation, use

Remark 3: The 5-number summary

min, Q1, Median, Q3, max

Remark 4: Change of Unit

  1. Adding (or subtracting) a constant to each observation will NOTchange the measures of spread, such as SD and IQR, (but does change the measures of center and quartiles)

If the new unit = 60 + the old unit; do the spread of the data change?

  1. Multiplying each observation by a constant adoes multiply measures of spread (SD and IQR) by |a|.

Conclusion:If new unit is aX + b, then the new spreadin terms of IQR is |a| times the original IQR

( Recall that the new center is )

Ex.Temperatures read in Fahrenheit and the SD temperature is s, and IQR is r. What are the new SD and IQR if we switch to Centigrade? Note that .

The new SD is 5s/9.

The new IQR = 5r/9.

1