Unit 12 – UNIVARIATE STATISTICS
Graphically Representing Data – Day 1
Univariate Statistics – The collection, organization, analysis and interpretation of data that has ______variable.
Quantitative data(______) on a single variable is often collected in order to understand how a characteristic of a group differs amongst the group members or between groups. When we ask a question like “How old is a typical fast food worker?” it is helpful to take a survey and then see graphically how the ages differ amongst the group.
Exercise #1: Charlie’s Food Factory currently employs 28 workers whose ages are shown below on a dot plot. Answer the following questions based on this plot.
(a) How many of the workers are 18 yrs. old?(b) What is the workers’ range of ages?
(c) Would you consider this distribution (d) The mean (average) age for a worker is symmetric? 22 years old. Why is this average not representative of a typical worker?
Exercise #2: A farm is studying the weight of baby chickens (chicks) after 1 week of growth. They find the weight, in ounces, of 20 chicks. The weights are shown below. Construct a dot plot on the axes given.
2, 1, 3, 4, 2, 2, 3, 1, 5, 3, 4, 4, 5, 6, 3, 8, 5, 4, 6, 3
Would you consider this distribution symmetric?
Exercise #3: The following histogram shows the ages of the workers at Charlie’s Food Factory (from #1) but in a different format.
(a) How many workers have ages between
19 and 21 years?
(b) What is the disadvantage of a histogram
compared to a dot plot?
(c) Does the histogram have any advantagesover the dot plot?
Exercise #4: The 2006 – 2007 Arlington High School Varsity Boy’s basketball team had an excellent season, compiling a record of 15 – 5 (15 wins and 5 losses). The total points scored by the team for each of the 20 games are listed below in the order in which the games were played:
76, 55, 76, 64, 46, 91, 65, 46, 45, 53, 56, 53, 57, 67, 58, 64, 67, 52, 58, 62
(a) Complete the frequency table below. (b) Construct the histogram below.
Try on your own:
Exercise #5: Consider the following dot plot that shows the age of all the viewers of a particular television show.
Age of Television Show Viewers
1.What does this graph tell us about who watched this television show?
2.Can you make a conclusion about the type of show this data is about?
Quartiles and Box Plots – Day 2
Another visual representation of how a data set is distributed comes in the form of a box plot. We create box plots by dividing the data up roughly into quarters by finding the quartiles of the data set.
Exercise #1: Shown below are the scores 16 students received on a math quiz.
52, 60, 66, 66, 68, 72, 72, 73, 74, 75, 80, 82, 84, 91, 92, 98
(a) What is the median of this data set?(b) Find the range of the data set.
(c) What is the median of the lower half of this data set (known as the first quartile, 1Q )?
(d) What is the median of the upper half of this data set (known as the third quartile, 3Q )?
The first and third quartiles are sometimes known as the lower and upper quartiles, respectively. The quartiles,the median, and the minimum and maximum values in a data set comprise what is known as the five numberstatistical summaryand can be graphically represented on a box plot. This type of plot is also sometimes known as a box and whiskers plot.
To find five number statistical summary in the calculator: ______
Exercise #2: Using the same data set, construct a box plot on the number line given below.
Minimum = Q1 = Median = Q3 = Maximum =
Exercise #3: The ages of the 15 employees of the Red Hook Curry House are given below:
16, 17, 17, 18, 19, 22, 25, 26, 29, 33, 33, 37, 40, 42, 44
(a)Determine the median and quartile values for this data set.
(b)Create a box plot below.
Exercise #4: Twenty of Mrs. Sullivan’sAlgebra students recently took a quiz. The results of this quiz are shown in the following box plot. Assume that all scores are whole numbers.
Scores on Mrs. Sullivan’s Algebra Quiz
Grades
(a) What was the median score on Mrs. Sullivan’squiz?
(b) What was the range of the scores on Mrs. Sullivan’s quiz?
(c) What score was greater than or equal to 75% of all other scores on this quiz?
(d) Mrs. Sullivan regularly sets the passing grade on her quizzes to be the score of the lower quartile. What is the passing grade on this quiz?
Try on your own:
Exercise #5: Which of the following shows a data set with the greatest median? EXPLAIN!!
Measures of Central Tendency – Day 3
In our day to day activities, we deal with many problems that involve related items of numerical information called data. Statistics is the study of sets of such numerical data. When we gather numerical data, besides displaying it, we often want to know a single number that is representative of the data as a whole. We call these types of numbers measures of central tendency. The two most common measures of central tendency are the mean and the median.
Exercise #1: A survey was taken amongst 12 people on the number of passwords they currently have to remember. The results in ascending order are shown below. State the median number of passwords and the mean number of passwords (to the nearest tenth).
0, 1, 1, 1, 2, 2, 3, 3, 3, 3, 4, 6
Exercise #2: Students in Mr. Buchholz’s Algebra class were trying to determine if people speed along a certain section of roadway. They collected speeds of 20 vehicles, as displayed in the table below.
(a) Find the mean and median for this data set.
(b) The speed limit along this part of the highway is 35 mph. Based
on your results from part (a), is it fair to make theconclusion
that the average driver does speed on this roadway?
To find measures of central tendency in the calculator:
When conducting a statistical study, it is not always possible to obtain information about every person or situation to which the study applies. Unlike a census, in which every person is counted, some studies use only a sample or portion of the items being investigated. Whenever a sample is taken, it is vital that it be fair; in other words, the sample reflects the overall population.
Exercise #3: To determine which television programs are the most popular in a large city, a poll is conducted by selecting a sample of people at random and interviewing them. Outside which of the following locations would the interviewer be most likely to find a fair sample? Explain your choice and why the others are inappropriate.
(1) A baseball stadium (3) A grocery store
(2) A concert hall (4) A comedy club
Exercise #4: Truong is trying to determine the average height of high school male students. Because he is on the basketball team, he uses the heights of the 14 players on the team, which are given below in inches.
69, 70, 72, 72, 74, 74, 74, 75, 76, 76, 76, 77, 77, 82
(a) Calculate the mean and median for this data set. Round any non-integer answers to the
nearest tenth.
(b) Is the data set above a fair sample to use to determine the average height of high
school male students? Explain your answer.
Try on your own:
Exercise #5: In Ms. Faber’sGeometry Course, eight students recently took a test. Their grades were as follows:
45, 78, 82, 85, 87, 89, 93, 95
(a) Calculate the mean and median of this data set.
(b) Which value, the mean or the median, is a better measure of how well the average
student did on Ms. Faber’s test?
Measures of Central Tendency (Cont’d) – Day 4
Measures of central tendency give us numbers that describe the typical data value in a given data set. But, they do not let us know how much variation there is in the data set. Two data sets can have the same mean but look radically different depending on how varied the numbers are in the set.
Exercise #1: The two data sets below each have equal means but differ in the variation within the data set. Use your calculator to determine the Interquartile Range (IQR)of each data set. The IQR is defined as the difference between the third quartile value and the first quartile value.
Data Set #1:3, 3, 4, 4, 5, 5, 6, 6, 7, 8, 8, 9, 9, 10, 10, 11, 11
Data Set #2:5, 5, 6, 6, 7, 7, 8, 8, 9, 9
The interquartile range gives a good measure of how spread out the data set is. But, the best measure of variation within a data set is the standard deviation. The actual calculation of standard deviation is complex and we will not go into it here. We will rely on our calculators for its calculation.
Exercise #2: Using the same data sets above, use your calculator to produce the standard deviation (shown as on the calculator) of each set. Round to the nearest tenth.
Data Set #1: Data Set #2:
Standard Deviation: Tells us, on average, how far a data point is away from the mean of the data set. The larger the standard deviation, the greater the variation within the data set.
Exercise #3: A farm is studying the weight of baby chickens (chicks) after 1 week of growth. They find the weight, in ounces, of 20 chicks. The weights are shown below. Find the mean, the interquartile range and the standard deviation for this data set. Round any non-integer values to the nearest tenth. Include appropriate units in your answers. Give an interpretation of the standard deviation.
2, 1, 3, 4, 2, 2, 3, 1, 5, 3, 4, 4, 5, 6, 3, 8, 5, 4, 6, 3
Exercise #4: A marketing company is trying to determine how much diversity there is in the age of people who drink different soft drinks. They take a sample of people and ask them which soda they prefer. For the two sodas, the age of those people who preferred them is given below.
Soda A:18, 16, 22, 16, 28, 18, 21, 38, 22, 29, 25, 44, 36, 27, 40
Soda B:25, 22, 18, 30, 27, 19, 22, 28, 25, 19, 23, 29, 26, 18, 20
(a) Which soda appears to have a greater diversity in the age of people who prefer it? How did you decide on this?
(b) Explain why standard deviation is a better measure of the diversity in age than the mean.
(c)Use your calculator to determine the standard deviation, given as, for both data sets. Round your answers to the nearest tenth. Did this answer reinforce your pick from (b)? How?
(d)How many people fell within one standard deviation away from the mean for Soda A? How about for Soda B?
Try on your own:
Exercise #5: Which of the following data sets would have a standard deviation closest to zero? Do this without your calculator. Explain how you arrived at your answer.
(1) {-5, -2, -1, 0,1, 2, 5}(3) {11,11,12,13,13}
(2) {5, 8,10,16, 20}(4) {3, 7,11,11,11,18}
The Distribution of Data– Day 5
Skewness – When the spread of data is ______. In our graphs, data can be skewed left, skewed right, or symmetrical. The way we describe the distribution of our graphs can dictate what measure of central tendency best represents our data.
Exercise #1: 30 students from Lancaster High were surveyed to determine how many pets they owned. The data was then graphed on a dot plot shown below.
How many pets do you own?
(a) Calculate the mean, median, and mode for the number of pets owned by the 30 students in this class.
(b) Describe the distribution of the data and why. What does it tell us about the data?
(c) Which measure of central tendency best represents the data?
Exercise #2: Twenty-six students from the senior class participated in a walk-a-thon to raise money for the school’s band program. The data below shows the number of miles each student walked. Create a dot plot from this data.
6, 3, 5, 7, 8, 8, 7, 6, 5, 9, 8, 10, 6, 11, 4, 6, 7, 5, 9, 6, 7, 9, 8, 6, 5, 7
Calculate the mean and median for the data. Which would best represent the typical distance for the seniors and why?
Exercise #3: The heights, in inches, of 20 students in Mrs. Millonzi’s class are shown in the following list. Create a box plot for this data below.
53, 60, 61, 63, 64, 65, 65, 65, 65, 66, 66, 67, 67, 68, 69, 70, 70, 71, 71, 73
Describe the distribution of the data. Is it skewed left, skewed right, or symmetrical? Why?
Exercise #4: Ms. Snyder graded a set of 32 Algebra test papers. The grades earned by the students were as follows:
48, 85, 53, 42, 65, 62, 47, 95, 50, 82, 50, 58, 77, 93, 73, 55,
43, 66, 45, 44, 50, 49, 78, 70, 55, 95, 80, 78, 83, 81, 51, 60
Test Scores
/Tally
/Frequency
41 – 5051 - 60
61 - 70
71 - 80
81 – 90
91 - 100
Total Frequency:
A. Complete the table and construct a frequency histogram below.
Try on your own:
Describe the distribution of the graph. Is it symmetric, skewed right, or skewed left? Why?
Comparing Sets of Data – Day 6
Outliers – Data sets can have members that are far away from all of the rest of the data set. These elements are calledoutliers(sounds like outsiders), which can result in a mean that does not represent the true average of a data set.
Let’s go back and look at a question from Notes, Day 3…
Exercise #1: In Ms. Faber’sGeometry Course, eight students recently took a test. Their grades were as follows:
45, 78, 82, 85, 87, 89, 93, 95
(a) Calculate the mean and median of this data set.
(b) Create a box plot to represent this data.
(c) What do we call the test score of 45 for this data set?
(d)Describe the distribution of the box plot and explain why.
(e) Which value, the mean or the median, is a better measure of how well the typical
student did on Ms. Faber’s test?
(f) Calculate the mean and median of the data only using the seven highest test scores. Which was mostaffected by the score of 45?
Here are a couple past statistics exam questions for you to try:
1)The two sets of data below represent the number of runs scored by two different youth baseball teams over the course of a season.
Team A: 4, 8, 5, 12, 3, 9, 5, 2Team B: 5, 9, 11, 4, 6, 11, 2, 7
Which set of statements about the mean and standard deviation is true?
(1) meanA mean B
standard deviation A standard deviation B
(2) meanA mean B
standard deviation A standard deviation B
(3) meanA mean B
standard deviation A standard deviation B
(4) meanA mean B
standard deviation A standard deviation B
2)Robin collected data on the number of hours she watched television on Sunday through Thursday nights for a period of 3 weeks. The data are shown in the table below.
Using an appropriate scale on the number line below, construct a box plot for the 15 values.
1