3.2 Measure of Dispersion:

Example 2:

Suppose there are two factories producing the batteries. From each factory, 10 batteries are drawn to test for the lifetime (in hours). These lifetimes are:

Factory 1: 10.1, 9.9, 10.1, 9.9, 9.9, 10.1, 9.9, 10.1, 9.9, 10.1

Factory 2: 16, 5, 7, 14, 6, 15, 3, 13, 9, 12.

The mean lifetimes of the two factories are both 10. However, by looking at the data, it is obvious that the batteries produced by factory 1 are much more reliable than the ones by factory 2. This implies other measures for measuring the “dispersion” or “variation” of the data are required.

(I) Range:

range=(largest value of the data)-(smallest value of the data).

Example 2 (continue):

Range of lifetime data for factory 1=10.1-9.9=0.2

Range of lifetime data for factory 2=16-3=13

The range of battery lifetimes for factory 1 is much smaller than the one for factor 2.

Note: the range is seldom used as the only measure of dispersion. The range is highly influenced by an extremely large or an extremely small data value.

(II) Interquartile Range:

Interquartile is the difference between the third and the first quartiles. That is,

.

Example 2 (continue):

The first quartile and the third quartile for the data from factory 1 are 9.9 and 10.1, respectively, and 6 and 14 for the data from factory 2. Therefore,

IQR (factory 1)=10.1-9.9=0.2

IQR (factory 2)=14-6=8.

The interquartile of battery lifetimes for factory 1 is much smaller than the one for factor 2.

(III) Variance and Standard Deviation:

population deviation about the mean:

sample deviation about the mean:

Intuitively, the population deviation and the sample deviation can measure how far the data is from the “center” of the data. Then, population variance and sample variance are the sum of square of the population deviation and sample deviation,

and

,

respectively. The population standard deviation and sample standard deviation are the square root of population variance and sample variance:

and

,

respectively.

Large sample variance or sample standard deviation implies the data are “dispersed” or are highly varied.

Note:

Example 2 (continue):

The sample variance of battery lifetimes for factory 2 is 1900 times larger than the one for factor 1.

The sample standard deviation for the data from factories 1 and 2 are and ,

respectively.

(IV) Coefficient of Variation:

The coefficient of variation is another useful statistic for measuring the dispersion of

the data. The coefficient of variation is

The coefficient of variation is invariant with respect to the scale of the data. On the other hand, the standard deviation is not scale-invariant. The following example demonstrates the property.

Example 2 (continue):

In the battery data from factory 1, suppose the measurement is in minutes rather than hours. Then, the data are 606, 594, 606, 594, 594, 606, 594, 606, 594, 606.

Thus, the standard deviation becomes 6.3245 which is 60 times larger than the one 0.1054 based on the original data measured in hours. However, no matter the data are measured in hours and minutes, the coefficient of variation is

Note: since the coefficient of variation is scale-invariant, it is very useful for comparing the dispersion of different data. For example, in the previous battery data, if the lifetime of the batteries from factory 1 and factory 2 are measured in minutes and hours, respectively, the standard deviation for factory 1, 6.3245, would be larger than for factory 2, 4.5946. However, the coefficient of variation for factory 1, 1.054 is still much smaller than the one for factory 2, 45.946.

Example 3:

The amount of time (in minutes) that a sample of students spends watching television per day is given below.

40 / 25 / 35 / 30 / 20 / 40 / 30 / 20
40 / 10 / 30 / 20 / 10 / 5 / 20

(a)  Compute the mean

(b)  The standard deviation.

(c) The coefficient of variation.

(d) The 40th percentile.

(e) The mode.

(f) The interquartile range.

(g) Construct a frequency distribution, a cumulative frequency distribution and a relative frequency distribution. Let the first class be 1-10.

[solution:]

(a)

(b)

(c)

.

(d)

1. The data are

5 / 10 / 10 / 20 / 20 / 20 / 20 / 25
30 / 30 / 30 / 35 / 40 / 40 / 40

2.

Thus,

is the 40th percentile.

(e)The mode is 20.

(f) Since

,

.

(g)

Class / Frequency / Cumulative Frequency / Relative Frequency
1-10 / 3 / 3 / 0.2
11-20 / 4 / 7 / 4/15
21-30 / 4 / 11 / 4/15
31-40 / 4 / 15 / 4/15

Online Exercise:

Exercise 3.2.1

Exercise 3.2.2

5