Course Notes: Probability and Statistics – Mrs. Leahy Unit 1: Organizing Data

Probability and Statistics

Unit 1: Organizing Data

A. Collecting & Organizing Data

______is the study of how to collect, organize, analyze, and interpret numerical

information from data.This data can represent information that is:

“Qualitative”/ ______or “Quantitative”/ ______.

Quantitative data MUST HAVE ______.

Example A1:

BOX in data that is “Qualitative”STAR beside data that is “Quantitative”

A survey is conducted at a local library collecting the following data from patrons:

AgeMarital StatusNumber of Children in Household

GenderDistance you live from libraryFavorite Book Genre

Example A2: Yellow Textbook pg 13 “Just Checking”

Goal in this chapter:1. Examine data & describe the distribution of the data

2. Choose the best way to organize/display the data

3. Create (by hand) the most common data displays

4. Read/Interpret data displays

B. Histograms & Frequency Tables

Have a large set of quantitative data? Organize into smaller intervals called ______.

A histogram uses ______to show ______of classes.

A relative frequency histogram uses bars to show the ______of cases in each class.

Basic Construction:Characteristics of a Histogram:

1. Used for high volume quantitative data

2. Bars equal width

3. Bars touch

4. Class limits/class boundaries on x-axis

5. Class frequency/relative frequency on y-axis

6. Classes cannot overlap or be open-ended

7. Use 4-15 classes. (some sources say 5-15)

Example B1: Textbook page 45

Example B2:

This histogram has _____ CLASSES.

The CLASS BOUNDARIES of this bar are ______to ______.

The FREQUENCY of this class is ______.

OK. Sounds good. Now how do we make a histogram?

I knew you were going to ask that. Here we go…

Example B3:

Time on Hold, in minutes
1 / 5 / 5 / 6 / 7 / 4 / 8 / 7 / 6 / 5
5 / 6 / 7 / 6 / 6 / 5 / 8 / 9 / 9 / 10
7 / 8 / 11 / 2 / 4 / 6 / 5 / 12 / 13 / 6
3 / 7 / 8 / 8 / 9 / 9 / 10 / 8 / 9 / 9

An irate customer called the Dollar Day Mail Order Company 40 times during the last two weeks to see why his order had not arrived. Each time he called, he recorded the length of time he was put “on hold” before begin allowed to talk to a customer service representative.

We are going to use five classes to organize our data. (The number of classes will be given to you for homework.) We need to determine how big each interval should be. This is called the “Class width.”

Step 1:Determine Class WidthIn our example:

1. Compute:

2. ROUND UP to next whole number.

Step 2: Determine the Data Range for each class: The Class Limits

Start with Lower Class Limits(LL) (The lowest value in the data class)

Lowest data value = Lowest Class Limit.

Add Class Width to get next lowest limit, etc.

Fill in Upper Class Limits(UL) (the highest value that fits in the class)

Class Limits / Class Boundaries / Tally (optional) / Frequency / Cumulative Frequency / Midpoints

Step 3: Determine the Class BoundariesStep 4: Determine the Frequency of each class

Upper Class Boundary = Upper limit + 0.5Class Frequency = # of data values in class

Lower Class Boundary = Lower limit – 0.5(count)

Step 5: Find the Class Midpoint

Class Midpoint = Average of Lower and Upper Limits

A table (like the one we just made) that shows the classes and corresponding frequencies is called a

______or ______

Example B4: Use the frequency table from Example B3 to construct a histogram.

Relative Frequency Table

The relative frequency of a class is the proportion (or percentage) of all data values in that class. It helps us compare the amount of data in each class.

Class Limits / Frequency / Relative Frequency
1 – 3 / 3
4 – 6 / 15
7 – 9 / 17
10 – 12 / 4
13 – 15 / 1
Total:

Step 1: Fill in your class limits and frequencies

(from our last example)

Step 2: Compute the Relative Frequency

1. Find the total frequency (sum)

2. Rel. Frequency =

***NOTATION***

Example B5: Given a data set of numbers {1, 7, 8, 4, 4, 5, 6, 3, 8, 7, 1, 1, 8, 1} and using four classes

a) Find the class width

b) Make a frequency table showing class limits, class boundaries, midpoints, frequencies, and

relative frequencies

c) Make a histogram.

d) Make a relative frequency histogram.

Class Limits / Class Boundaries / Frequency / Relative Frequency / Midpoints


Recall a ______can be used to represent a Frequency Distribution.

A: Distribution Shapes

Symmetric/Mound/Bell Shaped:

two sides are symmetrical with respect to a vertical line that goes through the middle of the graph

Uniform: every class has the same frequency

Bimodal: histogram shows ______peaks separated by at least one shorter bar

Unimodal: histogram shows ______peak

Skewed Left: More bars on the left side of the peak… “tail” on the left is longer than right

Skewed Right: More bars on the right side of the peak….“tail” on right is longer than left

Often a ______distribution is caused by collecting data from a group of individuals that could have been classified better into two separate groups for that particular data.

Example: height from a mixed group of men and woman

Significant gaps between bars at the left or right can be caused by ______.

These are values that are significantly higher or lower than the rest of your data.

Example: salaries of employees at a major corporation where the CEO makes three times as

much as rest of the workers.

Example A1 Look at Distributions – Textbook pages 50 -51

Example A2: Name that Distribution! Powerpoint

B: Dot Plots (Similar to a histogram)

horizontal axis = shows appropriate scale

indicates quantitative data results

vertically = one dot per occurrence of a particular

value

Example B1: A handful of pennies were examined and the year of minting was recorded. The information is recorded on the following dotplot.

In which year were the most pennies minted?

How many pennies were minted after 1996?

How many pennies were there total in this handful?

Describe (using a year range) when you think the majority of the

pennies in this handful were minted.

A dotplot can be created like this too:A dotplot can be used to tell a story, much like a histogram

Example B2: Create a dotplot for the following data set.

12, 15, 16, 16, 14, 12, 14, 18, 19, 14, 15, 18, 16, 13, 15, 16, 13, 10, 18, 16

How many numbers? ______Lowest? ______Start with: ______

Highest? ______End with: ______

Example B3: Create a dotplot for the data.

C: Frequency Polygon

Sometimes, we are interested in a frequency polygon

Start with your histogram data.

Instead of a bar, use a line graph with a dot at the midpoint of the class.

Example C1: Construct a frequency polygon for the following data.

Example C2: Construct a cumulative frequency polygon for the following data.

A. Exploratory Data Analysis (EDA)

Exploratory Data Analysis techniques are used to explore a data set, to detect patterns and

extreme data values, to raise new questions, or to pursue leads in many directions.

Useful when data has been gathered for ______.

For example: Ages of Applicants of Graduate Programs

B. Stem-and-Leaf Display

Used for ______data.

Best with small to medium size sets.

A stem-and-leaf display is used to ______order and arrange data into groups.

The ______are aligned vertically from smallest to largest.

A vertical line is drawn to the right of the stems.

The ______with the same stem are placed in the same row as the stem, arranged in

______order.

A label (Key) is used to indicate the magnitude of the numbers in the display.

Example B1:

A study on peanut butter reported the following optimal consumption temperatures for various brands:

56 44 62 36 39 53 50 65 45 40

Make a stem-and-leaf display for this data.

Step 1: Identify appropriate stem values.

List smallest to largest. No omissions!

Step 2: List leaves with corresponding stems

In numeric order smallest to largest!

Step 3: Include Key and Title

Example B2:

For the following data, use the first two digits as the step to make a stem-and-leaf display.

106 94 112 96 89 113 90 85 85 100

Example B3: Describe the distributions of the following stem/leaf displays.

C: Stem & Leaf Special Cases

Splitting the Stems

idea: for lots of data, use TWO (or more) intervals instead of one for the stem.

Consider:

0 0 1 2 3 3 4 5 5 7 7 8 9 9 9

Using only one stem “0” would give us an

overcrowded graph. Instead of using an interval

of 0-9, maybe we could use TWO intervals.

Example C1: Make a stem-and-leaf display using

a) Two intervals: 0-4, 5-9b) Five intervals: 0-1, 2-3, 4-5, 6-7, 8-9

Example C2. Britney is a swimmer training for a competition. The number of 50 meter laps she swam each day for 30 days are as follows:

a) Prepare a stem-and-leaf plot. b) Redraw the stem-and-leaf plot using two unit intervals.

c) Make a comment on what these plots show.

Back-To-Back Stem-and-Leaf Plots

If you are comparing two sets of data,

you can use a back-to-back

stem-and-leaf plot.

Example D1: The following class sizes were reported in Economics 101 and Math 151:

Econ 101:20, 34, 27, 15, 24, 35, 38, 28

Math 15114, 18, 21, 34, 29, 13, 32, 23

Make a back-to-back stem-and-leaf plot for the data.

A: Bar Graph

Clustered Bar Graph: two or more bars for each value on the

horizontal axis, clusters are uniformly spaced

Month / Ave. High Temp
Ft. Myers, FL / Ave. High Temp
Indianapolis, IN
January / 75 / 34
March / 80 / 49
May / 89 / 72
July / 92 / 84
September / 91 / 76
November / 81 / 51

Example A1:

Make a clustered bar graph for the following data.

Another type of bar graph is a SEGMENTED BAR GRAPH :

In this graph each bar is a whole and is divided proportionally based on the conditional distributions for each variable.

Example A2:

Use the contingency table to construct a segmented bar graph.

B: Pareto Chart

bars arranged by frequency, highest to lowest

Example B1: A sandwich shop records the number of each kind of sandwich sold last Friday. The numbers are recorded in the chart:

Design a Pareto Chart below for the types of sandwiches sold last Friday.

C: Pie Charts/Circle Graphs

Wedges visually display proportional parts of the total population as a percentage or as a portion of 360°

Good for qualitative/categorical data

The graph should have a title and wedges should be well labeled or have a key/legend.

Josh Sundquist’s Pie Charts for Math Nerds:

How do you make a circle graph by hand?

Step 1: Determine your grand total (if it’s not given)

Step 2: Determine the PERCENTAGE represented by each category

Percentage in each category = # in category/ Total

Step 3: Determine the number of DEGREES represented by each category

Degrees of category = Percentage of category x 360°

Step 4: Use a PROTRACTOR to mark off the correct number of degrees, one wedge at a time

Example C1: Let’s start easy

Make a circle graph for the following data

Elementary Teachers of Local Schools
Year / 1995
Male / 15
Female / 40

Example C2: Make a circle graph for the following data

D: Time Series Graph (Line Graph)

Data are plotted in order of occurrence at

regular intervals over time.Dots are connected

using line segments.

Example D1: Make a time series graph

Year / 1990 / 1995 / 2000 / 2005 / 2010
Enrollment / 30 / 34 / 32 / 40 / 52

for the following data

E: Displaying Data

Determine whether the statement is true or false.

  1. In a bar graph, the bars do not have to be of uniform width.
  1. The bars in a bar graph can be vertical or horizontal.
  1. The lengths of the bars in a bar graph stands for certain values of the variable being displayed.
  1. When two or more variables are displayed together, the bar graph is called a clustered bar graph (or a comparative bar graph).
  1. In a Pareto chart, the bars are arranged from left to right according to increasing height.
  1. A circle graph is also called a pie chart.
  1. Circle graphs are usually used to display percentages.
  1. A time series data contains the values of a variable taken at regular intervals over a certain time period.

THINK ABOUT IT.Best for what kind of data?

What can you “see” from the display?

Bar Graph

Pareto Chart

Circle Graphs

Time-Series Graph

Histogram

Dotplot

Stem & Leaf Plot

ALL GRAPHS:

Provide a title, label the axes, and identify units of measure.

1