Course Notes: Probability and Statistics – Mrs. Leahy Unit 1: Organizing Data
Probability and Statistics
Unit 1: Organizing Data
A. Collecting & Organizing Data
______is the study of how to collect, organize, analyze, and interpret numerical
information from data.This data can represent information that is:
“Qualitative”/ ______or “Quantitative”/ ______.
Quantitative data MUST HAVE ______.
Example A1:
BOX in data that is “Qualitative”STAR beside data that is “Quantitative”
A survey is conducted at a local library collecting the following data from patrons:
AgeMarital StatusNumber of Children in Household
GenderDistance you live from libraryFavorite Book Genre
Example A2: Yellow Textbook pg 13 “Just Checking”
Goal in this chapter:1. Examine data & describe the distribution of the data
2. Choose the best way to organize/display the data
3. Create (by hand) the most common data displays
4. Read/Interpret data displays
B. Histograms & Frequency Tables
Have a large set of quantitative data? Organize into smaller intervals called ______.
A histogram uses ______to show ______of classes.
A relative frequency histogram uses bars to show the ______of cases in each class.
Basic Construction:Characteristics of a Histogram:
1. Used for high volume quantitative data
2. Bars equal width
3. Bars touch
4. Class limits/class boundaries on x-axis
5. Class frequency/relative frequency on y-axis
6. Classes cannot overlap or be open-ended
7. Use 4-15 classes. (some sources say 5-15)
Example B1: Textbook page 45
Example B2:
This histogram has _____ CLASSES.
The CLASS BOUNDARIES of this bar are ______to ______.
The FREQUENCY of this class is ______.
OK. Sounds good. Now how do we make a histogram?
I knew you were going to ask that. Here we go…
Example B3:
Time on Hold, in minutes1 / 5 / 5 / 6 / 7 / 4 / 8 / 7 / 6 / 5
5 / 6 / 7 / 6 / 6 / 5 / 8 / 9 / 9 / 10
7 / 8 / 11 / 2 / 4 / 6 / 5 / 12 / 13 / 6
3 / 7 / 8 / 8 / 9 / 9 / 10 / 8 / 9 / 9
An irate customer called the Dollar Day Mail Order Company 40 times during the last two weeks to see why his order had not arrived. Each time he called, he recorded the length of time he was put “on hold” before begin allowed to talk to a customer service representative.
We are going to use five classes to organize our data. (The number of classes will be given to you for homework.) We need to determine how big each interval should be. This is called the “Class width.”
Step 1:Determine Class WidthIn our example:
1. Compute:
2. ROUND UP to next whole number.
Step 2: Determine the Data Range for each class: The Class Limits
Start with Lower Class Limits(LL) (The lowest value in the data class)
Lowest data value = Lowest Class Limit.
Add Class Width to get next lowest limit, etc.
Fill in Upper Class Limits(UL) (the highest value that fits in the class)
Class Limits / Class Boundaries / Tally (optional) / Frequency / Cumulative Frequency / MidpointsStep 3: Determine the Class BoundariesStep 4: Determine the Frequency of each class
Upper Class Boundary = Upper limit + 0.5Class Frequency = # of data values in class
Lower Class Boundary = Lower limit – 0.5(count)
Step 5: Find the Class Midpoint
Class Midpoint = Average of Lower and Upper Limits
A table (like the one we just made) that shows the classes and corresponding frequencies is called a
______or ______
Example B4: Use the frequency table from Example B3 to construct a histogram.
Relative Frequency Table
The relative frequency of a class is the proportion (or percentage) of all data values in that class. It helps us compare the amount of data in each class.
Class Limits / Frequency / Relative Frequency1 – 3 / 3
4 – 6 / 15
7 – 9 / 17
10 – 12 / 4
13 – 15 / 1
Total:
Step 1: Fill in your class limits and frequencies
(from our last example)
Step 2: Compute the Relative Frequency
1. Find the total frequency (sum)
2. Rel. Frequency =
***NOTATION***
Example B5: Given a data set of numbers {1, 7, 8, 4, 4, 5, 6, 3, 8, 7, 1, 1, 8, 1} and using four classes
a) Find the class width
b) Make a frequency table showing class limits, class boundaries, midpoints, frequencies, and
relative frequencies
c) Make a histogram.
d) Make a relative frequency histogram.
Class Limits / Class Boundaries / Frequency / Relative Frequency / MidpointsRecall a ______can be used to represent a Frequency Distribution.
A: Distribution Shapes
Symmetric/Mound/Bell Shaped:
two sides are symmetrical with respect to a vertical line that goes through the middle of the graph
Uniform: every class has the same frequency
Bimodal: histogram shows ______peaks separated by at least one shorter bar
Unimodal: histogram shows ______peak
Skewed Left: More bars on the left side of the peak… “tail” on the left is longer than right
Skewed Right: More bars on the right side of the peak….“tail” on right is longer than left
Often a ______distribution is caused by collecting data from a group of individuals that could have been classified better into two separate groups for that particular data.
Example: height from a mixed group of men and woman
Significant gaps between bars at the left or right can be caused by ______.
These are values that are significantly higher or lower than the rest of your data.
Example: salaries of employees at a major corporation where the CEO makes three times as
much as rest of the workers.
Example A1 Look at Distributions – Textbook pages 50 -51
Example A2: Name that Distribution! Powerpoint
B: Dot Plots (Similar to a histogram)
horizontal axis = shows appropriate scale
indicates quantitative data results
vertically = one dot per occurrence of a particular
value
Example B1: A handful of pennies were examined and the year of minting was recorded. The information is recorded on the following dotplot.
In which year were the most pennies minted?
How many pennies were minted after 1996?
How many pennies were there total in this handful?
Describe (using a year range) when you think the majority of the
pennies in this handful were minted.
A dotplot can be created like this too:A dotplot can be used to tell a story, much like a histogram
Example B2: Create a dotplot for the following data set.
12, 15, 16, 16, 14, 12, 14, 18, 19, 14, 15, 18, 16, 13, 15, 16, 13, 10, 18, 16
How many numbers? ______Lowest? ______Start with: ______
Highest? ______End with: ______
Example B3: Create a dotplot for the data.
C: Frequency Polygon
Sometimes, we are interested in a frequency polygon
Start with your histogram data.
Instead of a bar, use a line graph with a dot at the midpoint of the class.
Example C1: Construct a frequency polygon for the following data.
Example C2: Construct a cumulative frequency polygon for the following data.
A. Exploratory Data Analysis (EDA)
Exploratory Data Analysis techniques are used to explore a data set, to detect patterns and
extreme data values, to raise new questions, or to pursue leads in many directions.
Useful when data has been gathered for ______.
For example: Ages of Applicants of Graduate Programs
B. Stem-and-Leaf Display
Used for ______data.
Best with small to medium size sets.
A stem-and-leaf display is used to ______order and arrange data into groups.
The ______are aligned vertically from smallest to largest.
A vertical line is drawn to the right of the stems.
The ______with the same stem are placed in the same row as the stem, arranged in
______order.
A label (Key) is used to indicate the magnitude of the numbers in the display.
Example B1:
A study on peanut butter reported the following optimal consumption temperatures for various brands:
56 44 62 36 39 53 50 65 45 40
Make a stem-and-leaf display for this data.
Step 1: Identify appropriate stem values.
List smallest to largest. No omissions!
Step 2: List leaves with corresponding stems
In numeric order smallest to largest!
Step 3: Include Key and Title
Example B2:
For the following data, use the first two digits as the step to make a stem-and-leaf display.
106 94 112 96 89 113 90 85 85 100
Example B3: Describe the distributions of the following stem/leaf displays.
C: Stem & Leaf Special Cases
Splitting the Stems
idea: for lots of data, use TWO (or more) intervals instead of one for the stem.
Consider:
0 0 1 2 3 3 4 5 5 7 7 8 9 9 9
Using only one stem “0” would give us an
overcrowded graph. Instead of using an interval
of 0-9, maybe we could use TWO intervals.
Example C1: Make a stem-and-leaf display using
a) Two intervals: 0-4, 5-9b) Five intervals: 0-1, 2-3, 4-5, 6-7, 8-9
Example C2. Britney is a swimmer training for a competition. The number of 50 meter laps she swam each day for 30 days are as follows:
a) Prepare a stem-and-leaf plot. b) Redraw the stem-and-leaf plot using two unit intervals.
c) Make a comment on what these plots show.
Back-To-Back Stem-and-Leaf Plots
If you are comparing two sets of data,
you can use a back-to-back
stem-and-leaf plot.
Example D1: The following class sizes were reported in Economics 101 and Math 151:
Econ 101:20, 34, 27, 15, 24, 35, 38, 28
Math 15114, 18, 21, 34, 29, 13, 32, 23
Make a back-to-back stem-and-leaf plot for the data.
A: Bar Graph
Clustered Bar Graph: two or more bars for each value on the
horizontal axis, clusters are uniformly spaced
Month / Ave. High TempFt. Myers, FL / Ave. High Temp
Indianapolis, IN
January / 75 / 34
March / 80 / 49
May / 89 / 72
July / 92 / 84
September / 91 / 76
November / 81 / 51
Example A1:
Make a clustered bar graph for the following data.
Another type of bar graph is a SEGMENTED BAR GRAPH :
In this graph each bar is a whole and is divided proportionally based on the conditional distributions for each variable.
Example A2:
Use the contingency table to construct a segmented bar graph.
B: Pareto Chart
bars arranged by frequency, highest to lowest
Example B1: A sandwich shop records the number of each kind of sandwich sold last Friday. The numbers are recorded in the chart:
Design a Pareto Chart below for the types of sandwiches sold last Friday.
C: Pie Charts/Circle Graphs
Wedges visually display proportional parts of the total population as a percentage or as a portion of 360°
Good for qualitative/categorical data
The graph should have a title and wedges should be well labeled or have a key/legend.
Josh Sundquist’s Pie Charts for Math Nerds:
How do you make a circle graph by hand?
Step 1: Determine your grand total (if it’s not given)
Step 2: Determine the PERCENTAGE represented by each category
Percentage in each category = # in category/ Total
Step 3: Determine the number of DEGREES represented by each category
Degrees of category = Percentage of category x 360°
Step 4: Use a PROTRACTOR to mark off the correct number of degrees, one wedge at a time
Example C1: Let’s start easy
Make a circle graph for the following data
Elementary Teachers of Local SchoolsYear / 1995
Male / 15
Female / 40
Example C2: Make a circle graph for the following data
D: Time Series Graph (Line Graph)
Data are plotted in order of occurrence at
regular intervals over time.Dots are connected
using line segments.
Example D1: Make a time series graph
Year / 1990 / 1995 / 2000 / 2005 / 2010Enrollment / 30 / 34 / 32 / 40 / 52
for the following data
E: Displaying Data
Determine whether the statement is true or false.
- In a bar graph, the bars do not have to be of uniform width.
- The bars in a bar graph can be vertical or horizontal.
- The lengths of the bars in a bar graph stands for certain values of the variable being displayed.
- When two or more variables are displayed together, the bar graph is called a clustered bar graph (or a comparative bar graph).
- In a Pareto chart, the bars are arranged from left to right according to increasing height.
- A circle graph is also called a pie chart.
- Circle graphs are usually used to display percentages.
- A time series data contains the values of a variable taken at regular intervals over a certain time period.
THINK ABOUT IT.Best for what kind of data?
What can you “see” from the display?
Bar Graph
Pareto Chart
Circle Graphs
Time-Series Graph
Histogram
Dotplot
Stem & Leaf Plot
ALL GRAPHS:
Provide a title, label the axes, and identify units of measure.
1