Study Guide on Data

(e.stat sections are noted in parenthesis)

Questions to ask at the beginning:

  1. How will I find data to address a given question?
  2. What numerical form will the data take: nominal, ordinal, interval, or cardinal?
  3. Should the data be recoded or transformed, for example, to percentage changes?
  4. Are the data repeated observations of a common phenomenon?
  5. How might the data best be represented in a chart?
  6. What few statistics might best represent the whole data set?

Sources of Data

  1. Published data is less expensive than gathering one’s own. Use the World Wide Web (03-06) or the Library. (03-05). The Statistical Abstract of the United States is often a good place to start. . (The Statistical Abstract is on the web at http://www.census.gov/statab/www/.) Ask reference librarian for help.
  2. When published data isn’t sufficient, gather your own. Direct observation or a survey in person, by telephone, or by mail may yield useful data. (03-04)
  3. An experiment is a special form of direct observation in which the subjects are manipulated in some fashion with observation designed to reveal the consequences of the manipulation. (03-03)
  4. In chapter 10, e.stat discusses issues in finding samples of observations to represent a whole population.

Numerical Form (02-13), Transformation, and Frequency

How we use data will depend on its numerical form. Is the data discrete or continuous?

  1. If the data consists of only a small number of whole numbers or a few individual categories, it is discrete. Discrete data may be nominal or ordinal.
  2. If the data contains values along a range of the real number line, then they are continuous. Even if the data are just whole numbers, when there are many different whole numbers, we may treat the data as continuous. Continuous data is either interval or cardinal
  3. If the data are recorded as words, use an IF statement to recode to numbers. (02-13)
  4. When interested in percentage change, use the data to compute percentage change. (02-08) Other transformations may be useful in other settings.
  5. When the data consist of multiple observations of one phenomenon, summarize it with a frequency histogram. Define bins of uniform width and count how many observations fall in each bin. (04-03)

Chart

  1. Use a column chart to see a frequency count of discrete data. (02-04)
  2. Use a relative frequency polygon to see the frequency distribution of continuous data. (04-04)
  3. Use an X-Y chart to see data recorded over time. (02-10)
  4. Keep the chart simple and choose a design that emphasizes the important comparison.

Summary Statistics

Descriptive statistics are numbers that record attributes of the frequency distribution.

1.  Center Point

  1. The arithmetic mean (04-06) is the first choice. It uses all the data and indicates the center of gravity of the frequency distribution.
  2. The median (04-09) may be a better choice for a skewed distribution.
  3. The weighted average (04-08) gives different weight to each observation.
  4. The geometric mean (04-14) averages growth rates; the harmonic mean (04-15) averages reciprocals; the mode (04-13) is the most frequent value, and the trimmed mean (04-11) omits extreme values.

2. Dispersion

  1. The sum of the squared deviations from the mean divided by the number of observations minus one is the sample variance. The square root of the variance is the standard deviation (04-16).
  2. The range (maximum minus minimum) is an alternate measure of disperion (04-12).

3.  Skewness

Measure the symmetry of a frequency distribution with a skewness statistic.

  1. An informal indication of skewness is the mean minus the median. The mean is drawn toward a long tail by extreme values.
  2. Excel’s SKEW function returns a formal measure of skewness. Negative skew means the distribution has a long left tail. Positive skewness means a positive tail dominates.

4. Kurtosis

Excel’s KURT function returns a value of zero for a bell shaped distribution. A more peaked distribution with longer, thin tails will have positive kurtosis. A distribution with a less distinct peak and shorter tails will have negative kurtosis.

Use Excel’s Tools—Data Analysis suite of procedures to compute histograms and descriptive statistics.