Rare Event Rule

Measures of CenterModule 1

Mode

 the data value that occurs most frequently

for categorical (or nominal) data, this is the only measure of center that can be used

 There may be one, more than one, or no mode

Mean (also known as the Arithmetic Mean)

● Add the data values and then divide by the number of values

● for a Sample, mean is = Note: is pronounced “x bar”

Notation used above:

This symbol is the Greek letter capital sigma. It means “sum (add up) all these numbers”.

So, means to add up all the values that the x has in the problem.

For a data set, x represents the value of a data item. So means to sum together all the data values.

n = # of data values in a sample. N = # of data values in a population.

● for a Population, mean is = Note: is the Greek letter “mu”

Mean is affected by every data value, especially the extreme data values. So, the mean is very sensitive to outliers.

Median

 1) Arrange the values in order

2) The middle value is the Median

[if there are two values in the middle, average them]

 Median is NOT affected by extreme data values (it is resistant to outliers)

 the position of the median is when the data is written in order of size

Example A data: 15, 17, 20, 22, 32  notice the data is in order

n = 5 data items. Position of median = (5+1)/2 = 3rd.

The third data item is 20. The median is 20.

Example B data: 15, 17, 20, 22, 32, 36  notice the data is in order

n = 6 data items. Position of median = (6+1)/2 = 3 ½

Position “3 ½” is half way between the 3rd and 4th. The 3rd item is 20, the 4th item is 22.

The median is halfway between 20 and 22. So the median is 21.

Weighted Mean

 is used when you want to assign a different importance to the different data values

 each data value is multiplied by its “weight” or “importance”. Then those products are added up. Then divide by the total of the “weights”.

 in symbols weighted mean =

Example: Finding the gpa (grade point average) for a term when the student gets a 4.0 in the 5-credit chemistry class and a 3.2 in the 3-credit music class. The two grades are not equally important. The “weight” for each grade is the number of credits that the course is worth. The gpa is the weighted mean:

GPA = = 3.7

Module 1

Weighted MeanExample:

Researchers at Acme groceries studied how long customers had to stand in the check-out line.

-One day 35 customers spent on average 7.3 minutes each in the check-out line (this is a mean average).

-The next day 27 customers spent a mean average of 6.5 minutes each.

Altogether, what is the mean time customers spent in the check-out line on the two days?

------

Note: this answer is NOT (7.3 + 6.5) / 2

because that would give equal importance or “weight” to the two days. The number of customers on the two days was not the same, so the two days should not have equal weight.

Answer:

Use a weighted mean. The 7.3 minutes is the average of 35 customers, and the 6.5 minutes is the average of 27 customers.

The number of customers is the “weight” that should be used for each data value.

Mean = = = 6.95
Module 1 – Measures of Center

Finding median, mean, and mode from data in a histogram

Here is data on Quiz Grades, displayed in a histogram.

Note Each grade was a whole number from 0 to 10.

a) How many items are in this dataset? (that is, how many quiz grades are represented here?)

We find this by adding the frequency of each column: n = 2+2+4+9+7+8 = 32

b) What are the actual data values?

We do not know the first two grades exactly, but they are less than 6. Let’s call them x and y. Then the data values, written in order, are:

x,y, 6, 6, 7, 7, 7, 7, 8, 8, 8, 8, 8, 8, 8, 8, 8, 9, 9, 9, 9, 9, 9, 9, 10, 10, 10, 10, 10, 10, 10, 10

1. Find the median:

The number of items in the data set = n = 32

So the position of the median is at

(n+1)/2 = (32 + 1)/ 2 = 33/2 = 16.5

So the median is the 16.5th item in the list. That means the median is half way between the 16th item and the 17th item.

Looking at the list of data values, we see that the 16th item is 8 and the 17th item is 8. Halfway between 8 and 8 is still 8, so the median is 8.

So, half the students got 8 or above, half got 8 or below.

• Looking at the histogram (not the list) can you visualize the list in your head and figure out what the median is (just by examining the histogram)?

2. Is the mean greater than 8 or less than 8? (We cannot calculate the mean since the first three data values are not known. Simply look at the shape of the distribution and figure out this answer.)

Since the distribution is skewed left, we know that the more extreme values are on the left (they are small numbers) – and those values will “pull” the mean away from the middle towards those extreme values. Thus the mean will be less than the median. Therefore we know that the mean is less than 8. (It may be close to 8, but is a bit less than 8.)

3. What is the mode?

The most frequently occurring data value is the mode. It is 8.

Finding Measures of Center from a Grouped Frequency Distribution Table (GFDT)

Example:

Classes / Frequency
20-29 / 9
30-39 / 12
40-49 / 3
50-59 / 1

Mean

To get the mean we want to add all the data values and divide by the number of data values. But we don’t know exactly what the data values are. We know that 9 data values are in the class 20-29, but don’t know which numbers they are exactly.

To get around this problem, we assume that all data values in a class are equal to the class midpoint.

So, first find the midpoint of each class. The midpoint of the class is found by averaging the stated class limits. For example, the midpoint of the class 20-29 is (20 + 29)/2 = 49/2 = 24.5

Add a column at the front of the table giving the class midpoints:

Class Midpoint / Classes / Frequency
24.5 / 20-29 / 9
34.5 / 30-39 / 12
44.5 / 40-49 / 3
54.5 / 50-59 / 1

We also want to find the number of data values, which is the sum of all the frequencies. The sum of the frequencies is denoted f. In this example f = 9 + 12 + 3 + 1 = 25.

To find the mean we add all the data values and divide by the number of data values. To add the data values we would add 24.5 nine times (since the frequency of the first class is 9, and we assume that all those data values are the midpoint 24.5), and then add 34.5 twelve times (since the frequency of the second class is 12, and we assume all those data values are the midpoint 34.5) etc.

For this example, the mean equals (9 • 24.5 + 12 • 34.5 + 3 • 44.5 + 1 • 54.5) / 25

Which equals 823.5/25 = 32.94, so the mean = 32.90.

Median

The median is the number that is in the middle of the list of data values (when they are written in order). We assume the data values are equal to the midpoints of their classes.

In this example, the long way to find the median is to list the data items in order like this:

24.5, 24.5, 24.5, 24.5, 24.5, 24.5, 24.5, 24.5, 24.5, 34.5, 34.5, 34.5, 34.5, 34.5, 34.5, 34.5, 34.5, 34.5, 34.5, 34.5, 34.5, 44.5, 44.5, 44.5, 54.5

Then determine which number is in the middle of the list. It is 34.5.

The shorter way to do this problem is to first determine the position of the median, which is equal to (n + 1)/2 where n is the number of data values.

In this example, n = 25, so the position of the median is (25 + 1)/2 = 26/2 = 13. So the position of the median is the 13th data value when the values are written in order. You do not actually have to write all the values, but rather look at the frequencies in the table.

In this example, 9 data values are in the first class. The 13th data value would be four more data values after that class. The second class contains 12 data values. So four data values after the first class is definitely in the second class. We are assuming that all values in the second class are equal to the midpoint 34.5. And so the median, which definitely lies in the second class, is said to be equal to 34.5.

Mode

The mode is the most frequently occurring data value. For data in a GFDT, find the mode by finding the class with the highest frequency. The midpoint of that class is said to be the mode.

In this example, the class with the highest frequency is the second class (frequency of 12 for the class 30-39). So the mode is the midpoint of that class, which is 34.5.

Module 1 “Self Check” on Measures of Center

symbol / formula
Sample Mean
Population Mean

2. a) Which measure of center has exactly half the data items less than or equal to it?

b) Which measure of center is affected by extreme data values?

c) Which measure of center can be used for categorical data?

3. The number of students in the primary schools in one county (rounded numbers) is given by this Stem Plot.

Stem| Leaf

(hundreds)| (tens)

1| 5 7 9

2| 4 8 8

3| 1 5 6 6 9

5| 2

Find the:

Mode

Median

Mean

4. • Given the histogram to the right of age data of some students, we cannot calculate the exact mode, mean, or median since the data is in classes.

• Suppose the researcher who knows the exact data values tells you that the median age is 22.

• In that case, it is clear that the mean of the exact ages is which of the following:

Less than 22 or

Greater than 22 or

Exactly 22

Answers:

Sample: = Mean: =
a) median b) mean c) mode
mode = 280 and 360

median = 295 (there are 12 data items, so the median has 6 items less than it and 6 items larger than it. So the median is between item 280 and 310. Half way between them is 295.)

mean = 360 (Add the data values then divide by 12)

Notice that the mean is larger than the median since this data has an extreme large value.

-The mean is greater than 22 because the data is skewed to the right – so the more extreme values of the larger numbers (ages in the 50s) will more strongly affect the mean and “pull” the mean in that direction.

Module 1 – Measures of CenterMore Examples

1.Finding Mode, Mean, and Median

a) A sample of the people on an elevator have these weights, in pounds:

155, 120, 145, 230, 160

Mode =

Mean =

(Remember, to find median, what do you do first?)

Median =

b)What if the 160 pound guy says “oops – I lied – I actually weigh 220”

i) PREDICT before calculating: Does that change the mean or median? If so, predict how.

New data, in order: 120, 145, 155, 220, 230

ii) Calculate:

Mean =

Median =

c)What if another person is added to the sample, weighing 135.

i) PREDICT before calculating: how will the mean and median change (if they do change)?

New data, in order: 120, 135, 145, 155, 220, 230

ii) Calculate:

Mean =

Median =

2. Weighted Mean for a Grouped Data – from a GFDT (Grouped Frequency Distribution Table)

The employees of one company have these hourly wages.

This data is in a frequency table where there are classes (or categories) of data grouped together. We don’t know the exact data. But we can find approximate measures of center from the information we have.

Hourly Wages / frequency
$10 – 14 / 3
$15 – 19 / 10
$20 – 24 / 4
$25 – 29 / 2

We treat the data as if every number in a class is at the midpoint of the class.

So, first we find the midpoint of each class. Then we take a weighted mean.

a)What is the weighted mean?

b)What is the median?

c)What is the mode?

Answers:

1. a) There is no Mode; Mean =162; Median = 155 (remember to first put numbers in order of size)

b) i)Since 220 is larger, the mean will increase. Since 160 was not a middle value, the median won’t change

b) ii) Mean = 174, Median = 155

c) i) Adding a lower-than-mean weight person will make the mean decrease. Median might go down (can’t say for sure)

c) ii) Mean = 167.5, Median = 150 [since that is half way between 145 and 155]

2. a) Find each midpoint, multiply each by its frequency, then divide by sum of frequency:

(12•3 + 17•10 + 22• 4 + 27•2) / (3 + 10 + 4 + 2) = 348/19 = 18.32mean

b) Median’s position is (n+1)/2 = (19+1)/2 = 10th . Think about listing the midpoints in order. It would be:

12, 12, 12, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 22, 22, 22, 22, 27, 27

The tenth item in the list is 17. So our estimate of the midpoint is 17.

Note: there are other ways to estimate the median involving interpolation. We will just do this quick estimate.

c) One way to estimate grouped data Mode is the midpoint of the class with the highest frequency. Mode = 17

Calculator commands for finding mean and median

using Lists and the statistical function “1-Var-Stats”

(Full detailed directions are on Wamap, in the block of Calculator instructions.)

• For a list of data

Example data we can use: 120, 135, 145, 155, 220, 230

a)Enter the data into a list:

STAT button,

select Edit,

enter data in a list, let’s say in L1

b)Calculate values:

STAT button,

right arrow to CALC,

select “1-Var-Stats”

When “1-Var-Stats” appears on the home screen…

• If you simply press Enter, the calculator uses the data in L1.

• If your data is in a different list, you must tell it which list the data is in.

To get the list name to appear on the screen, notice that L1, L2, L3, L4, etc are above the number keys on the calculator (above 1, 2, 3, 4, etc.). So press “2nd” key and then the number you need.

• For data in a frequency table:

Midpoint / Hourly Wages / frequency
12 / $10 – 14 / 3
17 / $15 – 19 / 10
22 / $20 – 24 / 4
27 / $25 – 29 / 2

Enter the midpoints in one list.

Enter the frequencies in another list.

(the two lists must “match” – that is, the numbers must be in the order so matched pairs are next to each other.)

When you get the “1-Var-Stats” command to appear on the home screen,

then you must specify the two lists, with a comma between them.

Note: the comma key is above the number 7 key.

If the data (in this case that is the midpoints) is in L1 and the frequencies are in L2, then the command should look like this:

1-Var-Stats L1,L2 then press Enter

Note: if you ever get the error “DIM MISMATCH”

- That means the dimensions (or lengths) of your two lists do not match. So go back to the lists and fix them.

- If you get this error message when you weren’t even trying to use lists (probably you were trying to get a graph or statistical plot), then go to the “y=” screen and see if any of the “Plot”s listed at the top are highlighted and if so, un-highlight it. [Sometimes a Plot is turned on when you are not trying to make a plot but rather to do something else, but if the lists for that Plot that you don’t even want have dimensions that are mismatched, the calculator lets you know. So you must turn off the plot. Or if you wanted the plot, fix the lists.]

Comparing measures of center in histograms

When the data distribution is approximately symmetric, then the mean and median are approximately equal.

Below are examples of the effect on mean and median of skewed data distributions.

[Info in OLI text page 21.]

Example of a skewed-right distribution

frequency

Data Values

Mode

• One approach: Mode is the class with the greatest frequency. Two classes are the mode here. Class from 100-150 and Class from 150-200.

• Another approach: The mode is the Midpoint of the class(es) with the greatest frequency. In this case there would be two modes since two classes have equal greatest frequency. The modes are 125 and 175 (since those are the midpoints of the modal classes).

Median

• The median is the location along the horizontal axis where half the data values are less than it and half the data values are greater than it. This is equivalent to saying that half the area of the histogram is to the left and half the area is to the right. Clearly the value of 150 (which is the middle of the Mode values) has more area to the right of it than to the left. So the median is larger than 150.

Mean

• The mean is the measure of center that is strongly affected by extreme values. In this histogram there are some extreme values to the right (these are “extreme” in the sense that they are clearly farther from the median than any other data values). The mean will be “pulled to the right” by these extreme data values on the right. When we say “pulled to the right” we are indicating that the mean will be pulled to the right of the median.

Mean v. Median

• Suppose someone says “the two numbers 225 and 255 are measures of center, but I don’t know which is the mean and which is the median”. We can be sure that 225 is the median and 255 is the mean – because the mean is “pulled to the right” by the extreme values on the right, so the mean is greater than the median.

Example of a skewed-right distribution

Salary histograms are more typically skewed right, but this one is skewed left (perhaps some part-time employees account for that).

From:

Mode

• The modal class is 55-65 (the class with the greatest frequency)

• The modal value determined as the midpoint of that class is 60.

Median

• The median is the location along the horizontal axis where half the data values are less than it and half the data values are greater than it. This is equivalent to saying that half the area of the histogram is to the left and half the area is to the right. Clearly the value of 60 (which is the modal value) has more area to the left of it than to the right. So the median salary is less than 60.