Unit 2 Chapter 2 Summary

Section 2.1

Constructing a frequency table

  1. Determine the number of classes and determine the class width.

Class Width =

Increase this value to the next whole number.

  1. Create lower class limits and upper class limits using the class width. To set up the interval properly, determine the lower class limit and add the width then subtract 1.
  2. Tally the data with tick marks as you classify the data into its respective class. Then complete a table column entitled Frequency with the actual count for each class.
  3. Compute the midpoint (class mark) for each class. Do not round this value.

Midpoint =

  1. Determine class boundaries. To find the upper class boundary, add 0.5 to the upper class limits. To find the lower class boundary, subtract 0.5 from the lower class limits.
  2. Compute the relative frequency for each class.

Follow the directions in docsharing to construct a histogram and relative frequency histogram.

Section 2.2

Other types of graphs

  • Stem-leaf plot
  • Dot plots
  • Pareto charts
  • Circle graphs (pie chart)
  • Time-series graphs

Stem-and-leaf Displays is a method to display data that is used to rank-order and arrange data into groups. Be sure to include any stems that are empty as shown it the “6” stems.

Example: for the data shown below

1223 56 25 15 45 35 25 32 14 10 18 29 43 75 71 13

Stems / Leaves
1 / 0 2 3 4 5 8
2 / 3 5 5 9
3 / 2 5
4 / 3 5
5 / 6
6 / Note empty stem
7 / 1 5

Section 2.3

Measures of Central Tendency: Mode, Median, Mean

The 2 data set that will be used are shown below.

Data set x = {19, 18, 23, 19, 25, 27} Data set y = {2, 3, 4, 3, 2, 5, 6}

MODE of a data set is the value that occurs most frequently.

For data set x, the MODE is 19. For data set y, the MODES are 2 and 3. This last situation is called BIMODAL.

MEDIAN is the central value of an ordered distribution. The concept of order indicates that the measure is positional. So the first step to take is to rewrite the data set in ascending order.

Data set x = {18, 19, 19, 23, 25, 27} Data set y = {2, 2, 3, 3, 4, 5, 6}

Data set x has 6 items. Data set y has 7 items.

To find the median,

  1. If the number of items is odd, the median is the middle item.

Data set y has an odd number of items, 7. Therefore the MEDIAN = 3

  1. If the number of items is even, the median is the average of the middle two items.

Data set x has an even number of items, 6. The middle 2 values are 19 and 23.

Therefore MEDIAN =

MEAN is the arithmetic average of all the data values.

Mean = Mean of data set x = (rounded to 1 decimals)

Mean of data set y = (rounded to 1 decimals)

The proper formulas use the summation notation, .

Sample Mean = Pronounced x-bar. n is the number of values in the sample.

Population Mean =  is pronounced “mu”. N is the number of values in the population.

Section 2.4

Measures of Variation: Range, Standard Deviation, Variance

Measures of variation show the spread of data or the spread of data about the mean.

RANGE is the difference between the largest and smallest values of a data set.

Data set x = {18, 19, 19, 23, 25, 27} Data set y = {2, 2, 3, 3, 4, 5, 6}

Data set x has 6 items. Data set y has 7 items.

Mean of x = 21.8 Mean of y = 3.6

Range of x = 27 – 18 = 9 Range of y = 6 – 2 = 4

The range show the spread of the data but not how it is related to the mean. The standard deviation and deviation show the spread relationship with the mean. As with the mean, there is a sample standard deviation and a population deviation.

Method 1

SAMPLE VARIATION:

SAMPLE STANDARD DEVIATION: s = or

Method 2

SAMPLE VARIATION:

SAMPLE STANDARD DEVIATION: s = or

Method 1 for data set x:

x / x - mean / (x - mean)^2
18 / -3.8 / 14.7
19 / -2.8 / 8.0 / n = / 6
19 / -2.8 / 8.0
23 / 1.2 / 1.4 / Mean of x / 21.8
25 / 3.2 / 10.0
27 / 5.2 / 26.7 / s = / 3.7
s^2 = / 13.8
68.8

Data set for x: and

Method 1 for data set y:

y / y - mean / (y - mean)^2 / n = / 7
2 / -1.6 / 2.5
2 / -1.6 / 2.5 / Mean of y / 3.6
3 / -0.6 / 0.3
3 / -0.6 / 0.3 / s = / 1.5
4 / 0.4 / 0.2 / s^2 = / 2.3
5 / 1.4 / 2.0
6 / 2.4 / 5.9
13.7

Data set for y: and

Method 2 for data set x:

x / x^2
18 / 324
19 / 361
19 / 361
23 / 529
25 / 625
27 / 729
131 / 2929
/

Method 2 for data set y:

y / y^2
2 / 4
2 / 4
3 / 9
3 / 9
4 / 16
5 / 25
6 / 36
25 / 103
/

There is a corresponding standard deviation and variance for the population. The population standard deviation is denoted , pronounced sigma. The population variance is 2, called sigma squared. The formulas for these are found on page 88.

Data set x and y are quite different and would be difficult to compare with the measure that we have produced so far. In order to compare different data set, we can use the coefficient of variations to accomplish this.

Empirical Rule

This theorem is used to show the data spread about the mean.

Results of Chebyshev’s Theorem: For any set of data,

  • At least 75% of the data fall in the interval from  - 2 to  + 2
  • At least 88.9% of the data fall in the interval from  - 3 to  + 3
  • At least 93.8% of the data fall in the interval from  - 4 to  + 4

Using to estimate  and s for  we can draw some conclusions about data set x.

At least 75% of the data falls must fall within 2 standard deviations of the mean.

21.8 – 2(3.7) to 21.8 + 2(3.7)

14.4 to 29.2

Section 2.5

Percentiles and Five-Number Summary

For whole number P, where , the Pth Percentile of the distribution is a value such that P% of the data fall at or below it and (100 – P%) of the data fall at or above it.

Quartiles are special percentiles. The 25th percentile is the first quartile Q1, the 50th percentile is the second quartile Q2 , and the 75th percentile is the third quartile Q3. The second quartile Q2 is the same as the Median.

Data set D is shown. It has been arranged in ascending order.

2 / 5 / 7 / 8 / 8 / 11 / 12 / 14 / 20 / 23
23 / 25 / 26 / 27 / 28 / 29 / 31 / 36 / 36 / 42
  1. Find the median which is Q2. For this data, the median will fall between the 10th and 11th items.

Median =

  1. Find Q1. This is the median of the data from the 10th and below.

Q1 =

  1. Find Q3. This is the median of the data of the upper half of the data.

Q3 =

Now the five-number summary can be given.

Lowest value 2 Q2 = 9.5 Median or Q2 = 23 Q3 = 28.5 Highest value 42

Interquartile Range IQR = Q3 – Q1

The interquartile range for data set D is IQR = 28.5 – 9.5 = 19

The interquartile range is used to examine the data to evaluate if any extremely large or small value may produce too much influence on the data analysis.