Measures of Variation

Measures of Variation:

Variance – spread of the data.

Range – difference between the largest and smallest values of the distribution

-  indicates the variation between the smallest and largest entries, but does not tell how much other values vary from one another.

Range = largest value – smallest value

Ex. 45, 72, 88, 91, 27, 11, 99, 66, & 10 Range = 99 – 10 = 89

Look at the example on page 121 to see how the range can sometimes be deceiving.

Standard Deviation – measurement that will give you a better understanding of how the data entries differ from the mean.

-  formula differs depending on whether you are using an entire population or just a sample.

Sample Standard Deviation = s = √Σ (x – x bar)2

n - 1

where x is any entry in the distribution, x bar is the mean, and n is the number of entries.

*** Notice that the standard deviation uses the difference between each entry x and the mean x bar. The quantity (x – x bar) will be negative if the mean is greater than the entry. If you take the sum Σ (x – x bar) then the negative values will cancel the positive values, leaving you with a variation measure of 0 even if some entries vary greatly from the mean. Once the quantities become squared, the possibility of having some negative values in the sum is eliminated.

*** If we were working with an entire population, we would divide by N, the population size, and would thus have the mean of the values (x – μ)2, where μ represents the mean of the population.

We get our sample standard deviation formula from a formula for what we call a variance of a sample, denoted by s2:

Sample Variance = s2 = Σ (x – x bar)2

n – 1

Look at both of these formulas above. What difference do you see?

When we are working by hand with problems involving standard deviation, it is a good idea to break up our formula into steps and to create a table to aid us along the way.

The following example will show you how to break your formula up and will give you a table to guide you along.

Ex. A random sample of seven New York plays gave the following information about how long each play ran on Broadway (in days).:

12 45 36 118 50 7 20

a)  Find the range.

b)  Find the sample mean.

c)  Find the sample standard deviation.

Solution:

Part A is rather simple, we know our largest value is 118 and our smallest value is 7. If we substitute that in our range formula we arrive at:

Range = largest value – smallest value

Range = 118 - 7 = 111 days

Part B is just asking for the sample mean. We add up all of our entries and divide by the total number of entries. We then arrive at a sample mean of 41.14 days.

Part C is where it gets a little tricky. Let’s create a chart that breaks down the standard deviation formula.

Length of Broadway Plays (in days):

x / x – x bar / (x – x bar)2
7 / 7 – 41.14 = -34.14 / 1165.54
12
20
36
45
50
118
Σx = 288 / Σ(x – x bar) 2 =

We placed the entries in order in the x column and we took a sum of that column and placed it at the bottom. Now, we are going to use our sample mean from part B and use that to help us complete the x – x bar column. (One example is already completed!)

The last column will just be the result of column 2 squared. (We will round to the nearest hundredth!) Also, calculate the sum of column 3.

After we have completed this chart, we need to take care of the denominator of our formula, by figure out what n is equal to.

n = ______therefore n – 1 = ______

We will now take our Σ(x – x bar) 2 = ______and divide that by n – 1 = ______.

What is the result? ______

If we think about it, this answer only gives us a sample variance. What do you think we should do to the result above to come up with the sample standard deviation? Why?

s = ______

Let’s go through the following examples to get a better sense of what we are trying to accomplish:

1)  Petroleum pollution in oceans is known to increase the growth of a certain bacteria. Brian did a project for his ecology class for which he made a bacteria count (per 100 milliliters in nine random samples of sea water. His counts gave the following readings: 17 23 18 19 21 16 12 15 18

a)  Find the range.

b)  Find the sample mean.

c)  Find the sample standard deviation.

2) In the process of tuna fishing, porpoises are sometimes accidentally caught and killed. A U.S. oceanographic institute wants to study the number of porpoises killed. Records from eight commercial tuna fishing fleets gave the following information about the number of porpoises killed in a three-month period: 6 18 9 0 15 3 10 2

a) Find the range.

b) Find the sample mean.

c) Find the sample standard deviation.

2)  The neighborhood association of Cherry Hills Village took a survey of opinions about rent control in their neighborhood. In this opinion poll 1= strongly against rent control and 10 = strongly in favor of rent control. A random sample of 14 people gave the following opinions: 1 1 2 1 10 1 10 10 8 10 2 10 8 1

a)  Compute the range, sample mean, and sample standard deviation of opinion ratings about rent control.

Another questionnaire asked for opinions about moving a mailbox from one side of the street to another. Again a random sample of 15 people gave the following opinions where 1 = strongly disagree and 10 = strongly agree: 5 5 5 4 5 5 5 6 5 5 6 5 6 5

b)  Compute the range, sample mean, and standard deviation of these numbers.

c)  Compare your answers for parts a and b. Were the means about the same? Were the opinions on the two issues distributed differently? How did the range and standard deviation reflect this when the mean did not? Explain your answer!

3) Black Hole Pizza Parlor instructs its cooks to put a “handful” of cheese on each large pizza. A random sample of six such handfuls were weighed. The weights to the nearest ounce were: 3 2 3 4 3 5

a)  Find the mode, median, and mean weight of the handfuls of cheese.

b)  Find the range and standard deviation of the weights.

c)  A new cook used to play football and has large hands. His handful of cheese weighs 6 ounce. Replace the 2 ounce data value by 6 ounces. Recalculate the mode, median, and mean. Which average changed the most? Comment on the changes!

4) City Hospital has a temporary shortage of nurses, so the nurses have been working overtime. A random sample of six nurses reported that the overtime hours they worked last week were: 7 2 4 5 4 3

a)  Compute the mode, median, and mean of the overtime hours.

b)  Compute the range and standard deviation.

c)  Suppose a recording error occurred, and the data value of 7 was replaced by 2. Recompute the mode, median, and mean and comment on the changes these averages produced by changing the data.

*** We can also use the calculator to help us in solving standard deviation problems! If we create a list under the STAT menu, we see c chart beginning to develop. If we scroll over to the second column (L2), we can tell the calculator exactly what we would like to see this column calculate (and so on with L3).

***Notice, after we create a list, if we use the 1-VAR STATS function, the calculator gives us the sample mean and standard deviation of our entries. The Sx is the standard deviation.