Ch2.2, Ch2.3Numerical Summary Measures of Variability for Data
Topics:
- Measures of variability (spread):
- Deviation
- Variance/standard deviation
- Interquartile range
- Statistical definition of Outliers: the 1.5 x IQR criterion for outliers
------
II: Measures of Variability (Spread)
(1) Deviation: the difference between the observation and the mean
Deviation of is defined as .
What is the mean deviation () ?
So does not measure the spread in the data. We might use the average of absolute deviation:
.
But this measure is mathematically inconvenient. Then we consider the following definition.
(2) Variance / Standard Deviation
Sample Variance: the sample variance of n observations is average (using n-1) of squared deviations
► An alternative form convenient for calculation
- Sample Standard Deviation (SD): the sample standard deviation s is the square root of :
Ex. Grades of 9 students on a HW assignment: 86,85,81,82,84,84,83,84,87. . What is the SD? (Also know that
= 63532 – 9 * = 28/8=3.5
- Interpretation: A random student’s score is about 1.87 points from the mean score (84).
Comments:
- SD should be used as a measure of spread only when mean is used as the measure of center
- SD=0 implies all data points are the same (no variability)
- Like mean, SD is strongly influenced by outliers
Ex. If a coding error makes 87 to be 870, then =182 and SD becomes 278…
(3) InterquartileRange (IQR)
(a) Quartiles:Q1 = first quartile = median of the lower half of the data. Q3 = third quartile = median of the upper half of the data (Q2 = median).
IQR = Q3 – Q1 (good spread measure in the presence of outliers)
To find quartiles:
- Sort the data and divide data points into 2 halves (If there are odd number of observations, include the median in each half.)
- Lower quartile Q1= median of the lower half of the data.
- Upper quartile Q3 = median of the upper half of the data.
(b) Inter-QuartileRange (IQR)
- IQR = Q3 – Q1
- Interpretation of IQR: Measure of variability (spread) of data, similar to s but usually larger than s.
Ex. Grades of 9 students on a HW assignment: 86,85,81,82,84,84,83,84,87.
The sorted hw scores are:
81 82 83 84 84 85 86 87. So median = 84, Q1 = 83, Q3 = 85, IQR = 85 – 83 = 2.
Ex. Rainfall in NC in the some 15 months
1 | 0
2 | 25
3 | 45Stem: one digit
4 | 11667Leaf: tenths digit
5 | 449
6 | 0
7 |
8 | 2
Find quartiles and the IQR
Remark 1: The 1.5 x IQR Criterion for Outliers
An observation is called an outlier if it is 1.5*IQR larger than Q3 or 1.5*IQR smaller than Q1.
Extreme outliermay indicatedata entry error or unusual characteristics in the data that need careful investigation (if it is 3*IQR larger than Q3 or 3*IQR smaller than Q1.
Ex. Use the 1.5xIQR rule to check if there is any outlier in the Rainfall dataset.
Q1 = 3.45, Q3 = 5.4, IQR = 5.4 – 3.45 = 2.925.
Q1 – 2.925 = 3.45 – 2.925 = 0.525 (no data point smaller than 0.525)
Q3 + 2.925 = 5.4 + 2.925 = 8.325 (no data point larger than 8.325)
Remark 2: Standard numerical summaries of a data set includes sample size, center,and spread.
For reasonably symmetric distribution with no outliers, use
For the rest situation, use
Remark 3: The 5-number summary
min, Q1, Median, Q3, max
Remark 4: Change of Unit
- Adding (or subtracting) a constant to each observation will NOTchange the measures of spread, such as SD and IQR, (but does change the measures of center and quartiles)
If the new unit = 60 + the old unit; do the spread of the data change?
- Multiplying each observation by a constant adoes multiply measures of spread (SD and IQR) by |a|.
Conclusion:If new unit is aX + b, then the new spreadin terms of IQR is |a| times the original IQR
( Recall that the new center is )
Ex.Temperatures read in Fahrenheit and the SD temperature is s, and IQR is r. What are the new SD and IQR if we switch to Centigrade? Note that .
The new SD is 5s/9.
The new IQR = 5r/9.
1