Chapter 3: Describing Data: Numerical Measures

Homework #3 (Week 4): Chapter 3 Exercises 10, 14, 22, 30, 36 and 42.

Summarising Techniques: Numerical Methods

(i) Measures of Location (central tendency of data)

(ii) Measures of Dispersion ("spreadoutness" of data)

(i) Measures of Location

1. Arithmetic Mean ("Mean" or "Average"): Add all the observations and divide by the number of observations

("population") mean =  = ("mu") = (x)/n

where x = x1 + x2 + ... + xn with n representing the size of the "population"

Examples:

(A) Ungrouped Data: 5, 8, 6 and 1

Mean =  = (x)/n = (5 + 8 + 6 + 1)/4 = 5

(B) Grouped Data: 5 occurs 8 times, 8 occurs 2 times, 6 occurs 4 times and 2 occurs once.

Formula: Mean =  = (fixi)/( fi)

[Note: fi = n]

Mean =  = (8x5 + 2x8 + 4x6 + 1x2)/(8 + 2 + 4 + 1) = 82/15 = 5.47

Note: Could "ungroup" the data and use earlier method

 = (5+5+5+5+5+5+5+5+8+8+6+6+6+6+2)/(8 + 2 + 4 + 1) = 5.47

Example:

Table 4: The calculation of average wealth

 Use of mid-point as representative value

 Mean wealth appears “high”

Comments on Mean

 Mean is the balancing point ("fulcrum") of the histogram

 Extreme values ("outliers") seem to influence the mean excessively

 Population mean is also referred to as expected value, i.e. E(x) = 

[Useful for later in course.]

2. Median

(ii) Grouped data

For grouped data we must first identify the class interval that contains the median observation. Then we must calculate where in the interval that observation lies."

19,281 observations

9,641st observation?

6,984 observations less than £25,000

9,968 observations less than £40,000

Median is approximately £38,500

Comments on Median

 Extreme values ("outliers") do not influence the median excessively

 Median generalises into percentiles, deciles, quintiles and quartiles (see textbook), e.g. some USA-based entry to graduate school examinations

(ii) Measures of Dispersion ("spreadoutness" of data)

1. Range (or Inter-QuartileRange)

2. “Mean Deviation” (Note: Different from Mean (Absolute) Deviation in Textbook.)

3. Variance

4. Standard Deviation

5. Coefficient of Variation (not in textbook explicitly)

6. Z-Scores (not in textbook explicitly)

2. “Mean Deviation”: average of the deviations from the mean

Formula: Mean Deviation = [(xi - )]/n

Example: Ungrouped Data: 5, 8, 6 and 1

 = (x)/n = (5 + 8 + 6 + 1)/4 = 5 as before

Mean Deviation = (5 - 5) + (8 - 5) + (6 - 5) + (1 - 5) = 0

4

Problem: Mean Deviation = 0 always (concept of fulcrum)

Technical Proof: Mean Deviation = [(xi - )]/n = [xi - n]/n = xi/n - n/n =  -  = 0

3. Variance

Grouped Data

Formula: 2 = [f(xi - )2]/f = [f(xi - )2]/n

Table 5

5. Coefficient of Variation

If two sets of data have the same means then it is easy to compare their variations by calculating (and comparing) their standard deviations. However, if the means are different then the comparisons of spread will not be so obvious.

Example/Question: A hospital is comparing the times patients are waiting for two types of operation. For bypass surgery the mean wait is 17 weeks with a standard deviation of 6 weeks, while for hip replacement the mean is 11 months with a standard deviation of 1 month. Which operation has the highest variability?

Answer: The problem here is not only that the means are quite different but also that the units are different (weeks and months). In order to compare the variability the coefficient of variation is calculated.

This is defined as: Standard Deviation

Mean

This is usually expressed as a percentage by multiplying by 100.

For bypass surgery the coefficient of variation is (6/17) x 100 = 35.3%, while for hip replacement it is (1/11) x 100 = 9.1%. Therefore, relative to the mean, the bypass surgery has a larger spread or variation.

6. Measuring Deviations From The Means: Z-Scores

Focus on single observation.

Example:

Average Male Salary = €19,500

Standard Deviation of Male Salaries = €4,750

Average Female Salary = €16,800

Standard Deviation of Female Salaries = €3,800,

Individual Man’s Salary = €31,375

Individual Woman’s Salary = €26,800

Is the man doing better (relative to other men) than the woman is doing (relative to other women)?

One way to resolve the problem is to calculate the “z-score”, which gives the salary in terms of standard deviations from the mean.

z = (X - )

How unusually high is the individual man’s salary?

How unusually high is the individual woman’s salary?

Individual Man: z-score = (31,375 - 19,500)/4,750 = 2.5, i.e. man’s salary is 2.5 standard deviations above the mean male salary.

Individual Woman: z-score = (26,800 - 16,800)/3,800 = 2.632, i.e. woman’s salary is 2.6 standard deviations above the mean female salary.

The woman is nearer the top of her distribution than the man is nearer the top of his distribution.

Z-score measures how unusually high or low an outcome is relative to the appropriate mean and standard deviation.

Table 4: The calculation of average wealth

Range / Mid-point x (£000) / f / fx
0- / 5 / 4,021 / 20,105
10,000- / 17.5 / 2,963 / 51,853
25,000- / 32.5 / 2,984 / 96,980
40,000- / 45 / 1,423 / 64,035
50,000- / 55 / 1,172 / 64,460
60,000- / 70 / 1,947 / 136,290
80,000- / 90 / 1,485 / 133,650
100,000- / 150 / 2,199 / 329,850
200,000- / 250 / 564 / 141,000
300,000- / 400 / 327 / 130,800
500,000- / 750 / 137 / 102,750
1,000,000- / 2000 / 59 / 118,000
Totals / 19,281 / 1,389,773

Table 5: The calculation of the variance of wealth

Range / Mid-point x (£000) / Frequency
f / Deviation
(x - ) / (x - )2 / f (x - )2
0- / 5 / 4,021 / -67.1 / 4499.7 / 18,093,344.5
10,000- / 17.5 / 2,963 / -54.6 / 2979.0 / 8,826,673.9
25,000- / 32.5 / 2,984 / -39.6 / 1566.6 / 4,674,639.7
40,000- / 45 / 1,423 / -27.1 / 733.3 / 1,043,515.6
50,000- / 55 / 1,172 / -17.1 / 291.7 / 341,899.2
60,000- / 70 / 1,947 / -2.1 / 4.3 / 8,422.7
80,000- / 90 / 1,485 / 17.9 / 321.1 / 476,878.2
100,000- / 150 / 2,199 / 77.9 / 6071.5 / 13,351,321.7
200,000- / 250 / 564 / 177.9 / 31655.6 / 17,853,737.5
300,000- / 400 / 327 / 327.9 / 107531.6 / 35,162,831.2
500,000- / 750 / 137 / 677.9 / 459575.7 / 62,961,866.2
1,000,000- / 2000 / 59 / 1927.9 / 3716875.9 / 219,295,679.4
Totals / 19,281 / 382,090,809.7

Hence we obtain