Chapter 3 Part B
Descriptive Statistics: Numerical Methods

  1. Measures of Relative Location and Detecting Outliers

1.Skewness

2.z-Scores

  • The z-score is often called the standardized value.
  • It denotes the number of standard deviations a data value xi is from the mean.
  • A data value less than the sample mean will have a z-score less than zero.
  • A data value greater than the sample mean will have a z-score greater than zero.
  • A data value equal to the sample mean will have a z-score of zero.

3.Chebyshev’s Theorem

At least (1 - 1/z2) of the items in any data set will be within z standard deviations of the mean, where z is any value greater than 1.

•At least 75% of the items must be within

z = 2 standard deviations of the mean.

•At least 89% of the items must be within

z = 3 standard deviations of the mean.

•At least 94% of the items must be within

z = 4 standard deviations of the mean.

4.Empirical Rule

For data having a bell-shaped distribution:

•Approximately 68% of the data values will be within onestandard deviation of the mean.

•Approximately 95% of the data values will be within twostandard deviations of the mean.

•Almost all (99.7%) of the items will be within threestandard deviations of the mean.

5.Detecting Outliers

  • An outlier is an unusually small or unusually large value in a data set.
  • A data value with a z-score less than -3 or greater than +3 might be considered an outlier.
  • It might be an incorrectly recorded data value.
  • It might be a data value that was incorrectly included in the data set.
  • It might be a correctly recorded data value that belongs in the data set !

B. Exploratory Data Analysis

1. Five-Number Summary

  • Smallest Value
  • First Quartile
  • Median
  • Third Quartile
  • Largest Value

2. Box Plot

  • A box is drawn with its ends located at the first and third quartiles.
  • A vertical line is drawn in the box at the location of the median.
  • Limits are located (not drawn) using the interquartile range (IQR).
  • The lower limit is located 1.5(IQR) below Q1.
  • The upper limit is located 1.5(IQR) above Q3.
  • Data outside these limits are considered outliers.
  • Whiskers (dashed lines) are drawn from the ends of the box to the smallest and largest data values inside the limits.
  • The locations of each outlier is shown with the symbol * .

C. Measures of Association Between Two Variables

1.Covariance

  • The covariance is a measure of the linear association between two variables.
  • Positive values indicate a positive relationship.
  • Negative values indicate a negative relationship.
  • If the data sets are samples, the covariance is denoted by sxy:
  • If the data sets are populations, the covariance is denoted by .:

2.Correlation Coefficient

  • The coefficient can take on values between -1 and +1.
  • Values near -1 indicate a strong negative linear relationship.
  • Values near +1 indicate a strong positive linear relationship.
  • If the data sets are samples, the coefficient is rxy.
  • If the data sets are populations, the coefficient is .

D. The Weighted Mean and Working with Grouped Data

1. Weighted Mean

  • When the mean is computed by giving each data value a weight that reflects its importance, it is referred to as a weighted mean.
  • In the computation of a grade point average (GPA), the weights are the number of credit hours earned for each grade.
  • When data values vary in importance, the analyst must choose the weight that best reflects the importance of each value.

where: xi= value of observation i

wi = weight for observation i

3.Mean for Grouped Data

  • The weighted mean computation can be used to obtain approximations of the mean, variance, and standard deviation for the grouped data.
  • To compute the weighted mean, we treat the midpoint of each class as though it were the mean of all items in the class.
  • We compute a weighted mean of the class midpoints using the class frequencies as weights.
  • Similarly, in computing the variance and standard deviation, the class frequencies are used as weights.
  • Sample Data
  • Population Data

where:

fi = frequency of class i

Mi = midpoint of class i

4.Variance for Grouped Data

  • Sample Data
  • Population Data

5.Standard Deviation for Grouped Data