251descr2 2/10/06 (Open this document in 'Outline' view!)

G. Measures of Dispersion and Asymmetry.

1. Range

or .

InterquartileRange: . See 251descr2ex2 for example.

2. The Variance and Standard Deviation of Ungrouped Data.

a. The Population Variance - Definitional and Computational Formulas.

The definition of the population variance is ‘the average squared deviation of measurements from the mean.’ The definitional formula just realizes this definition.

Definitional Computational

Standard Deviation =

b. The Sample Variance.

Definitional Computational

The computational formula is one of the most important formulas you will learn. Note that is not the same as For example, if is ,,not .

Example: Use

Computational Method / Definitional Method
2 / 4 / 2 / -1.33333 / 1.77778
3 / 9 / 3 / -0.33333 / 0.11111
5 / 25 / 5 / 1.66667 / 2.77778
10 / 38 / 10 / 0.00001 / 4.66667

From this we find and Note that should be zero, but is not because of rounding. Now, if we use the computational method, we can use (Some texts prefer which gives us a little more accuracy for a little more work.) If we use the definitional method , but note that we had to do three subtractions instead of 1.

251descr2 2/10/06

c. The Coefficient of Variation.

d. Chebyshef’s Inequality and the Empirical Rule

Chebyshef Inequality: or . A z-score is the same as (See explanation below)

Empirical rule: (For Symmetrical Unimodal distributions only)

68% within one standard distribution of the mean, 95% within two and almost all (99.7%) within three.

3. The Variance and Standard Deviation of Grouped Data.

For grouped data generally substitute for .

4. Skewness and Kurtosis.

Define Population Skewness, the 3rd k-statistic, coefficients of Skewness; Population Kurtosis, the 4th k-statistic, the Coefficient of Excess; Leptokurtic, Platykurtic and Mesokurtic distributions.

The usual measurement of skewness is often called the third moment about the mean .

(The population variance is the second). The formula for population skewness is:

.

The corresponding sample statistic is the third k-statistic,. The corresponding computational formulas are

and . To make grouped data formulas, put an to the right of the sign. Positive values of these formulas imply skewness to the right, negative values to the left. Note that multiplying all the values of x by two would multiply the values of these coefficients by eight, but would not change the shape of the distribution. If we want to compare shapes, we need measurements that will not change if we multiply all values by a constant. Such a measure would be called the coefficient of relative skewness, with the formulas . Note that for the Normal distribution . Other measures of skewness are Pearson's measures of skewness, and . These are roughly equivalent, since, for a moderately skewed distribution,. It seems that and that values between. 1 and -1 are considered to indicate moderate skewness.

251descr2 2/10/06

Example:

Profit Rate / / (midpoint) / / /
9-10.99 / 3 / 10 / 30 / 300 / 3000
11-12.99 / 3 / 12 / 36 / 432 / 5184
13-14.99 / 5 / 14 / 70 / 980 / 13720
15-16.99 / 3 / 16 / 48 / 768 / 12288
17-18.99 / 1 / 18 / 18 / 324 / 5832
Total / 15 / 202 / 2804 / 40024

So ,, , , so that and , which means . .To measure skewness, use one of the following three results.

= 0.680, or Relative Skewness or Pearson's Measure of Skewness Note that, in this case, Pearson's Measure 1 and Relative Skewness contradict each other as to the direction of skewness.

The measures of kurtosis are, for populations, and, for samples,. can be considered an estimate of . To get a measurement of shape use the Coefficient of Excess . Since the Normal distribution has , the coefficient of excess is zero for the Normal distribution. Kurtosis has traditionally been considered a measure of the peakedness of a distribution relative to the Normal distribution, though there are some exceptions to this interpretation. If the coefficient of excess is positive, we may call a distribution leptokurtic or sharp-peaked (and long-tailed). If the coefficient of excess is negative, the distribution can be called platykurtic or flat-peaked (and short-tailed). If the coefficient of excess is close to zero, we call the distribution mesokurtic, middle-peaked. A symmetric, mesokurtic distribution is essentially Normal. An alternate measure, called simply the coefficient of kurtosis is . This is dimension-free and takes values between zero and 0.5. Values above .263 ( for the Normal distribution) indicate a leptokurtic distribution. Values below .263 indicate a platykurtic distribution.

251descr2 2/10/06

Example (using definitional formulas):

Profit Rate / /
midpoint / / / / / /
9-10.99 / 3 / 10 / 30 / -3.467 / -10.400 / 36.053 / -124.985 / 433.323
11-12.99 / 3 / 12 / 36 / -1.467 / -4.400 / 6.453 / -9.465 / 13.885
13-14.99 / 5 / 14 / 70 / 0.533 / 2.667 / 1.422 / 0.759 / 1.079
15-16.99 / 3 / 16 / 48 / 2.533 / 7.600 / 19.253 / 48.775 / 123.457
17-18.99 / 1 / 18 / 18 / 4.533 / 4.533 / 20.551 / 93.164 / 422.317
Total / 15 / 202 / 0.000 / 83.732 / 8.249 / 944.466

So ,,,,

and , so that and , which means . .

To measure skewness, use one of the following three results. = 0.680, or Relative Skewness or Pearson's Measure of Skewness . Note that, in this case, Pearson’s Measure and Relative Skewness contradict each other as to the direction of skewness.

=-31.0337. So . The negative sign implies that the distribution is platykurtic.

5. Review

a. Grouped Data. See 251dscr_D

b. Ungrouped Data. See 251dscr_D

Appendix: Explanation of Sample Formulas(Not for student consumption until you know about expected value.) See 251dscr_B .

Appendix: Explanation of Computational Formulas(The part about the variance is fairly easy, the rest is more difficult) See 251dscr_C .

251descr2 2/10/06

Appendix: Explanation of Chebyshef’s Inequality

Make a diagram. Show a curve that looks like a Normal curve with the middle marked . Mark off two points on either side of on your axis at equal distances from . Label these points and . ( can be any number above one, like 1.32 or 5. ) The areas below and above are the left and right tails of the distribution. Then the statement ,, means that the total proportion of points that isin these two tails cannot be greater than . The statement ,, means that the proportion of points that is between and must exceed . For example, suppose and Then ,,and the proportion of points between 11.04 and 18.96 is above or 42.61%. The proportion of points in the tails is at most 57.39%.

Measures of Inequality

Measuring Inequality, a PowerPoint presentation which explains various measures of income inequality, is available on the ECAAR website. This presentation was prepared by Paul Burkholder, ECAAR's Project Manager as part of ECAAR’s project on "Inequality and Democratic Development." Paul is a recent graduate of TempleUniversity, with a degree in economics. He does research for current and potential projects, and assists with media and member outreach.
See

Correction?:

In response to a student query (Thank you!), a small correction was made above in the computations for the variance using grouped data and definitional formulas. However, the results , so that bear more explanation. If you used the numbers given here, you would have gotten and . My result occurred because I tend to carry more decimal places than I admit, something that may occur in other places in these notes. Obviously if your answers differ from mine because of this sort of rounding error, I have no business calling them wrong.