Section 2.2 Frequency Distributions

This section will focus on ways to organize categorical data and numerical data into tables, charts, and graphs.

Frequency: is how often a categorical or quantitative variable occurs.

Frequency Distribution: is a listing of the distinct values of categorical data or quantitative data, and how often they occur. A frequency distribution can be displayed in a tabular form or in a graph.

Procedure 1: Frequency Distribution for Categorical or Quantitative Data

1.  List the distinct values of the observations for the variable in the data set in the first column of a table.

2.  For each observation, make a talley mark in the second column of the table in the row for the appropriate distinct value.

3.  Count the tallies for each distinct value and record the totals in the third column of the table.

Example 1:

Make frequency distributions for the class for the two variables

1.  Eye color of the students in the class.

2.  The number of 8 ounce glasses of water a student drinks in a 24 hour period.

Bar Chart: A graphical display of categorical data where the variable is on the horizontal axis and the frequency (relative frequency or percent) is on the vertical axis. In a bar chart the bars do not touch

Procedure 2: Construct a Bar Chart

1.  Obtain a frequency (relative frequency, percent) distribution of the data.

2.  Draw a horizontal axis on which to place the bars and a vertical axis to display the frequencies (relative frequencies, percent) Remember that this data is categorical so the bars should be positioned so that they do not touch each other

3.  For each distinct value, construct a vertical bar whose height equals the frequency (relative frequency, percent) of that class.

4.  Label the bars with the distinct values, the horizontal axis with the name of the variable, and the vertical axis with “Frequency”( “Relative Frequency,” “Percent”).

Example 2

For the eye color data draw a bar chart using the frequency data.

Histogram is a graphical display of numerical data where the values of the numerical variable is on the horizontal axis and the frequency (relative frequency, percent) is on the vertical axis.

Procedure 3: Construct a Histogram

1.  Obtain a frequency (relative frequency, percent) distribution of the numerical data.

2.  Draw a horizontal axis on which to place the bars and a vertical axis to display the frequencies (relative frequencies, percents)

3.  For each distinct value, construct a vertical bar whose height equals the frequency (relative frequency, percent) of that class.

4.  Label the bars with the distinct values, the horizontal axis with the name of the variable, and the vertical axis with “Frequency”( “Relative Frequency,” “Percent”).

Example 3

Draw a histogram of the data in Example 1 for the number of 8 ounce glasses of water students in the class drink over a 24 hour period.

Dot plot: a graph that can be used to show the distribution of a numerical variable when the sample size is small.

Procedure 4: Construct a Dotplot

1.  Draw a horizontal axis to display the possible values of the quantitative data.

2.  Record each observation by placing a dot over the appropriate value on the horizontal axis.

3.  Label the horizontal axis with the name of the variable.

Example 4

Draw a dot plot of the data in Example 1 for the number of 8 ounce glasses of water a random sample of 20 students in the class drink over a 24 hour period.

Notice that the stacks of dots represent the heights of the bars in the histogram.

Sometimes instead of the frequency for each distinct level for the qualitative variable we want the proportion (relative frequency) for each distinct level for the qualitative variable.

Relative frequency: is the ratio of the frequency to the total number of observations.

Example 5

Draw a relative frequency bar chart from the eye color frequency distribution found in Example 1.

Quantitative data can organized by dividing the observations into classes (groups)

Why do we do organize data into groups?

The grouped data shows more information about the shape of the data set then single value data does.

The grouping quantitative data into classes is called cutpoint grouping. Each class will have a range of values.

Terms used in Cutpoint Grouping

Lower class cutpoint: The smallest value that can go into a class.

Upper class cutpoint: The smallest value that can go into the next higher class.

Class width: The difference between the cutpoints in a class.

Class midpoint: The average of the two cutpoints in a class.

In general how do we choose the classes width? Decide on the (approximate) number of classes. Normally between 5 and 20 classes.

Procedure 5: How to determine the widths and classes for Cutpoint Grouping

1.  Calculate an approximate class width as

-If the answer is a decimal, then round up to the nearest whole number and that is your class width.

-If the answer is a whole number, then that is your class width.

3. Choose a number for the lower cutpoint of the first class, noting that it must be less than or equal to the minimum observation.

4. Obtain the other lower cutpoints by successively adding the class width chosen in step 2.

5. Use the results obtained from Step 4 to specify all of the classes.

Example 6 from Elementary Statistics by Neil A. Weiss

1.  Construct a frequency distribution using cutpoint grouping. Add in a last column for the relative frequency data.

2.  Draw a group data frequency histogram

Stem and Leaf Diagram

A stem and leaf diagram is a visual display of the “raw” data.

In this type of diagram, each observation except for the rightmost digit is thought of as the stem. The right most digit is thought of as the leaf.

Procedure: Construct a Steam and Leaf Diagram

1.  Think of each observation as a stem-consisting of all, but the right most digit and a leaf, the right most digit

2.  Write the stems from smallest to largest in a vertical column to the left of a vertical rule.

3.  Write each leaf to the right of the vertical rule in the row that contains the appropriate stem.

4.  Arrange the leave in each row in ascending order

Example 7 from Elementary Statistics by Neil A. Weiss

Distribution of a Data Set: is a table, graph, or formula that provides the values of the observations and how often they occur.

So far we have looked at how to construct different distributions, but we have not discussed the shape of the distributions. To help us understand the shape of the distribution a smooth curve that approximates the overall shape is best.

The common distribution shapes are: bell shaped, triangular, uniform, right skewed, left skewed, bimodal, and multimodal.

Modality

The number of peaks (high points) in the shape of the distribution.

unimodal: has one peak.

bimodal: has two peaks.

multimodal: has three or more peaks.

Symmetry and Skewness

symmetric distribution: is one where the distribution can be divided into two pieces that are mirror images of one another. Symmetric distributions include bell shaped, triangular, and uniform.

skewness: is a unimodal distribution that is not symmetric is either right skewed or left skewed.

tail: the part of the skewed distribution that longer than the other side.

Population and Sample Distributions

Population data: The values of a variable for the entire population.

Sample data: the values of a variable for a sample of the population.

Population and Sample Distributions; Distribution of a Variable

Population distribution or the distribution of the variable is the distribution of population data.

Sample distribution is the distribution of sample data.

Population and Sample Distributions

For a simple random sample, the sample distribution approximates the population distribution (the distribution of the variable under consideration). The larger the sample size, the better the approximation tends to be.

Example 8 from Elementary Statistics by Neil A. Weiss

Example 9