Statistics
Notes - 2.1
Organizing Data
frequency distribution- A table that shows the frequency with which each item appears in a set of data.
ungrouped frequency distribution- each value in the distribution stands alone
Ex. – Twenty-five army inductees were given a blood test to determine their blood type. The data set is as follows. Construct an ungrouped frequency distribution for the data.
ABBABO
OOBABB
BBOAO
AOOOAB
ABAOBA
grouped frequency distribution- frequency for classes is displayed
Guidelines for Constructing a Grouped Frequency Distribution
1. Each class should be the same width.
2. Classes should be set up so that they do not overlap and so that each piece of data belongs to exactly one class.
(Guidelines 3, 4, and 5 are helpful, but not necessary)
3. For the exercises given in this textbook, 5 to 12 classes are most desirable since all samples contain less than 125 data.
4. When it is convenient, an odd class width is often advantageous.
5. Use a system that takes advantage of a number pattern, to guarantee accuracy.
Procedure for Selecting Classes for a Grouped Frequency Distribution
1. Identify the high and the low scores (H = 98, L = 39) and find the range.
Range = H - L = 98 - 39 = 59.
2. Select a number of classes (m = 9) and a class width (c = 7) so that the product (mc = 63) is a bit larger than the range (range = 59).
3. Pick a starting point. This starting point should be a little smaller than the lowest score L. Suppose that we start at 35; counting from there by 7s (the class width) we get 35, 42, 49, 56, ..., 98. These are called the lower class limits.
lower class limit - the smallest value that can go into each class.
upper class limit - the largest value that can go into each class.
Notes
1. At a glance you can check the number pattern to determine whether the arithmetic used to form the classes was correct.
2. The class width is the difference between a lower class limit and the next lower class limit. (It is not the difference between the lower and upper limit of the same class.)
3. Class boundaries are numbers that do not occur in the sample data but are halfway between the upper limit of one class and the lower limit of the next class.
4. The class mark is the numerical value that is exactly in the middle of each class.
Use the following data to construct a grouped frequency distribution and answer the
following questions.
Statistic Exam Scores
60 47 8295 88 7267 66 6898
90 77 86 58 64 95 74 7288 74
77 39 90 63 68 97 70 64 70 70
58 78 89 44 55 85 8283 7277
7286 50 94 9280 91 75 76 78
Use classes 35 - 41, 42 - 48 etc.
lower class limit = upper class limit =
class width = class boundaries =
class mark =
Statistics
Notes - 2.2
Histograms, Frequency Polygons, and Ogives
Histogram - a type of bar graph representing an entire set of data.
A histogram is made up of:
1. A title, which identifies the population of concern
2. A vertical scale, which identifies the frequencies in the various classes.
3. A horizontal scale, which identifies the variable x. Values for the class boundaries, class limits, or class marks may be labeled along the x-axis. Use whichever one of these sets of class numbers best presents the variable.
Ex.Create a histogram for the Statistics Exam Scores Data:
Describing histograms
Notes
1. The mode is the value of the piece of data that occurs with the greatest frequency.
2. The modal class is the class with the highest frequency.
3. A bimodal frequency distribution has two high-frequency class separated by classes with lower frequencies.
Frequency Polygons
Frequency Polygon - a graph that displays the data by using lines that connect points plotted for the frequencies at the midpoints of the classes. The frequencies are represented by the heights of the points.
Ex.Create a frequency polygon for the Statistics Exam Scores Data:Ogives
Ogive - a cumulative frequency graph
Cumulative frequency distribution - a frequency distribution in which the frequencies replaced with cumulative frequencies.
The cumulative frequency for any given class is the sum of the frequency for that class and the frequencies of all classes of smaller values.
Components of an Ogive
1. A title, which identifies the population.
2. A vertical scale, which identifies either the cumulative frequencies or the cumulative relative frequencies.
3. A horizontal scale, which identifies the upper class boundaries. Until the upper boundary of a class has been reached, you cannot be sure you have accumulated all the data in that class. Therefore, the horizontal scale for an ogive is always based on the upper class boundaries.
Every ogive starts on the left with a relative frequency of zero at the lower class boundary of the first class and ends on the right with a relative frequency of 100% at the upper class boundary of the last class.
Ex.Create an ogive for the Statistics Exam Scores Data:
Statistics
Notes - 2.3
Graphs and Stem-and-Leaf Displays
Graphs and other similar displays are designed to visually reveal patterns of behavior of the variable being studied.
1. Bar Graphs – use bars to represent the frequencies of a distribution
- Bar graphs are used to compare quantities.
Ex. The following data shows the number of each type of operation performed at GeneralHospital last year:
Type of Operation Number of Cases
Thoracic 20
Bones and joints 45
Eye, ear, nose, and throat 58
General 98
Abdominal 115
Urologic 74
Proctologic 65
Neurosurgery 23
Bar Graph
Alternatives to the “typical” Vertical Bar Graph
1. Horizontal Bar Graph
2. Compound Bar Graph – comparison of two sets of data
3. Component Bar Graph – bars placed above each other or end to end to show the components that make up the whole
2. Circle Graphs – a circle that is divided into sections or wedges according to the percentages of frequencies in each category of the distribution
- Circle graphs are used to compare percentages (parts of the whole).
3. Stem and Leaf Plot – a data plot that uses part of a data value as the stem and part of the data value as the leaf to form groups or classes.
Ex. A sample of 19 exam scores was randomly selected from a large class.
76 74 8296 66 76 78 725268
86 84 6276 78 928274 88
Create a stem and leaf plot for the data.
3. Time Series Graph – represents data that occurs over a specified period of time.
Ex.A mine safety engineer wishes to use the following data for a presentation showing how the number of deaths from surface mining has changed over the years. Draw a time series graph for the data.
YearNumber of Deaths
193046
194024
195032
196010
19708
19801
19903
As with a bar graph, two data sets can be compared on the same graph by using two lines.
5. Pictograph – a graph that uses symbols or pictures to represent data.
Ex. Create a pictograph to display the mine safety data.