Part I: True/False and Multiple Choice (2 points each)
True/False questions. Circle true or false following each statement.
- The median of a density curve is always the point that divides the area under the curve in half.
TrueFalse
- Simpson’s paradox results from a variable omitted in the pooled table acting as a lurking variable.
True False
- A low value of r2 indicates that we should not use the regression to describe the association between the independent and dependent variable.
TrueFalse
- Lurking variables are one reason we cannot infer causality from an association among two variables.
TrueFalse
- The height of a density curve for a range of values gives the proportion of observations that fall under the density curve for that range of values.
TrueFalse
Part I: Multiple Choice (2 points each)
Circle the correct answer below. There is one correct answer to each question. Also, note that many questions have “all of the above” or “none of the above” choices.
- As part of a survey of college students a researcher is interested in the number of cigarettes smoked per day. She records a 1 if the student does not smoke, a 2 if the student smokes at least once a week but not every day, and a 3 if the student smokes at least one cigarette per day, and a 4 if the student smokes more than a pack a day. This variable is
a) ordered categoricalb) quantitative
c) unordered categoricald) All of the above.
- A description of different houses on the market includes the following three variables. Which of the variables is quantitative?
a) The square footage of the houseb) The monthly electric bill
c) The monthly gas billd) All of the above.
- When drawing a histogram it is important to
a)have a separate bin for each observation to get the most informative plot.
b)make sure the heights of the bars exceed the widths of the bins so that the bars are true rectangles
c)label the vertical axis so that the reader can determine the counts or percent in each bin
d)make certain the mean and median are contained in the same bin interval, so that the correct type of skewness can be identified.
- If a histogram has a bar that is taller than the others then
a)the bar corresponds to the bin containing the most observations
b)this is suggestive of a skewed distribution
c)the bin for this bar should be shortened for the sake of symmetry
d)all of the above.
Use the following to answer questions 5-7:
The following histogram represents the distribution of acceptance rates (percent accepted) among 25 business schools in 1997. In each bin, the left endpoint is included but not the right.
- What percent of the schools have an acceptance rate of under 20%?
a) 3%b) 4%c)12%d) 16%
- What is the approximate width of each bin in this graph?
a)10
b)5
c)3
d)none of the above could plausibly be the width of the bin.
- Which of the following intervals include the median of this distribution?
a)30 to 40
b)20 to 30
c)15 to 25
d)cannot be determined from the information given
Use the following box plot of the exam scores in a statistics class to answer questions 8-10. The boxplot is drawn per Moore (e.g. not per StataQuest).
90
75
60
45
30
- Approximately 25% of the students scored below
a) 90b) 65c) 75d) 60
- The interquartile range of the exam scores is approximately
a) 14b) 55c) 65d) 5
- The maximum exam score is approximately
a) 75b) 60c) 65d) 90
- For the density curve below, which of the following is true?
0.00.25 0.50 0.75 1.00
a) the density curve is symmetricb) the median is 0.5
c) the mean is 0.5d) all of the above
- If removing an observation from a data set would have a marked change on the position of the regression line fit to the data, the point is called
a) robustb) a residualc) influentiald) a response
- Using data from the fifty states, a researcher calculates the correlation coefficient between the infant mortality rate (deaths per 1000) X in 1990 in the state versus the percent 18 year olds in the state in 1990 that graduated from high school. The correlation between X and Y is r = -0.54. If instead of plotting these variables for each of the fifty states, we plotted the values of these variables for each county in the United States, we would expect the value of the correlation r to be
a)exactly the same
b)smaller (closer to zero)
c)+ 0.54 (the magnitude is the same, but the sign should change)
d)higher (closer to –1)
- Consider the following scatterplot.
Y
60
40
20
X
15 20 25 30
The correlation between X and Y
a)is approximately 0.999
b)is approximately 0.8
c)is approximately 0.0
d)cannot be computed because there is an outlier in the plot
- In a statistics class with 136 students, the professor records how much money each student has in her or his possession during the first class of the semester. The histogram below shows the data collected.
50
40
30
20
10
0 10 20 30 40 50 60 70 80 90 100
Amount of Money in $
From the histogram, which of the following is true?
a)The mean is much larger than the median.
b)The mean is much smaller than the median.
c)The mean and the median are approximately equal.
d)If is impossible to compare the mean and the median for these data.
- X and Y are two categorical variables. The best way to determine if there is a relation between them is to
a)calculate the correlation between X and Y.
b)draw a scatterplot of the X and Y values
c)make a two-way table of the X and Y values
d)all of the above
- A study of the salaries of full professors at Upper Wabash Tech shows that the median salary for female professors is considerably less than the median male salary. Further investigation shows that the median salaries for male and female full professors are about the same in every department (English, Physics, etc.) of the university. This apparent contradiction is an example of
a)extrapolation
b)Simpson’s paradox
c)Causation
d)Correlation
Part III. Free Response
Answer all questions. In some cases, we will award partial credit for correct parts of a problem even if the final answer is incorrect. Partial credit will only be given for work that is seen as a step toward the (correct) final answer. Random facts relating to the problem will not get partial credit. To get partial credit, you need to show your work.
- Below are date from Fortune Magazine on the number of research centers in 10 American cities.
cityrschctrs
1. Memphis85
2. Denver302
3. Indianapolis69
4. Los Angeles515
5. Phoenix121
6. San Francisco345
7. Detroit361
8. Minneapolis235
9. Seattle153
10. Orlando33
Use this data to answer the questions below:
- What is the five number summary for this data? (5 points)
- What is the interquartile range? (2 points)
- If we were to delete Los Angeles from the data, which would change more, the standard deviation of the variable or the interquartile range? (2 points)
- Below is a stem-and-leaf plot of the percentage of the population Christian among states in the Northeastern region of the United States. (Hint: the minimum is 36% Christian).
Stem-and-leaf plot for pctchris
3 | 6 9
4 |
5 | 5 9 9
6 | 1
- What is the mean for this data? (3 points)
- What is the standard deviation for this data? (3 points)
- Normal distribution problems
- What proportion of the area under the standard normal curve falls to the right of z = -.5? (3 points)
- What proportion of the observations of a standard normal distribution falls between –1 and 1 standard deviations from the mean? (3 points)
- A social psychologist has developed a test to measure gregariousness. The test is normed so that it has a mean of 70 and a standard deviation of 20, and the gregariousness scores are normally distributed. What percentage of scores are above 105? (4 points)
- Scores on the California test of basic skills are normally distributed with mean 50 and standard deviation 25. What is the lowest score you would need on the California test of basic skills to be in the top 20% of all scores? (4 points)
- On the California test of basic skills (mean 50 and standard deviation 25) what percentage of scores fall between 35 and 50? (4 points)
- The graph below is a histogram drawn in StataQuest of age at first marriage for 296 married persons in the general social survey. Each bin is two years wide.
.2
.1
0
10 20 30 40 50
age when first married
- What proportion of persons in the sample were married at the ages of either 20 or 21? (Give your best guess based on the graph. Close will get full credit. 2 points)
- The mean age at first marriage in this sample is 21.8. Will the median age of marriage for the sample be greater than, less than, or equal to 21.8? (2 points)
- A researcher regresses years of education (dependent) on number of siblings (independent) using data on individuals from a large survey. She gets the following regression equation:
Ŷ = -.227x + 13.48
- Explain in one or two sentences what the slope says about the relationship between number of siblings and years of education. (3 points)
- A statistics professor has 5 siblings and 20 years of education. What is the residual for the statistics professor? (4 points)
- If the standard deviation of the number of siblings variable is 3.0, and the standard deviation of the years of education variable is 3. 15, what is the correlation between education and number of siblings? (4 points)
- Draw the regression line on the graph axes below. (3 points)
20
15
10
5
0 5 10 15
number of brothers and sisters
- Place an “x” on the graph above to show where the statistics professor (of part b) would appear if graphed on the scatterplot. Then put an “o” on the graph to show the predicted value for the professor based on the regression. (2 points)
- When we delete the statistics professor from the regression, the slope of education changes to -.231. Is the statistics professor acting as an influential observation? (2 points)
- Below is a crosstabulation based on data from the 1986 general social survey. The two variables are belief in life after death (based on a survey question) and the education of the respondent in three categories (less than 11 years of education, 12 years of education, and 13 or more years of education).
belief in | education
life after|
death|0/111213+|Total
yes|___380436|1116
no|8674___|246
Total|386___522|1362
- Fill in the missing (blank) frequencies in the table above. (3 points)
- Percentage the conditional distributions assuming that education is the independent variable and belief in life after death is the dependent variable. Write the percentages below the corresponding frequencies above. (4 points)
- Describe in words the association of the independent and dependent variable (mention both the direction and strength of relationship). (4 points)
Part I: True/False and Multiple Choice (2 points each)
True/False questions. Circle true or false following each statement.
- The median of a density curve is always the point that divides the area under the curve in half.
TrueFalse
- Simpson’s paradox results from a variable omitted in the pooled table acting as a lurking variable.
True False
- A low value of r2 indicates that we should not use the regression to describe the association between the independent and dependent variable.
TrueFalse
- Lurking variables are one reason we cannot infer causality from an association among two variables.
TrueFalse
- The height of a density curve for a range of values gives the proportion of observations that fall under the density curve for that range of values.
TrueFalse
Part I: Multiple Choice (2 points each)
Circle the correct answer below. There is one correct answer to each question. Also, note that many questions have “all of the above” or “none of the above” choices.
- As part of a survey of college students a researcher is interested in the number of cigarettes smoked per day. She records a 1 if the student does not smoke, a 2 if the student smokes at least once a week but not every day, and a 3 if the student smokes at least one cigarette per day, and a 4 if the student smokes more than a pack a day. This variable is
a) ordered categoricalb) quantitative
c) unordered categoricald) All of the above.
- A description of different houses on the market includes the following three variables. Which of the variables is quantitative?
a) The square footage of the houseb) The monthly electric bill
c) The monthly gas billd) All of the above.
- When drawing a histogram it is important to
a)have a separate bin for each observation to get the most informative plot.
b)make sure the heights of the bars exceed the widths of the bins so that the bars are true rectangles
c)label the vertical axis so that the reader can determine the counts or percent in each bin
d)make certain the mean and median are contained in the same bin interval, so that the correct type of skewness can be identified.
- If a histogram has a bar that is taller than the others then
a)the bar corresponds to the bin containing the most observations
b)this is suggestive of a skewed distribution
c)the bin for this bar should be shortened for the sake of symmetry
d)all of the above.
Use the following to answer questions 5-7:
The following histogram represents the distribution of acceptance rates (percent accepted) among 25 business schools in 1997. In each bin, the left endpoint is included but not the right.
- What percent of the schools have an acceptance rate of under 20%?
a) 3%b) 4%c)12%d) 16%
- What is the approximate width of each bin in this graph?
a)10
b)5
c)3
d)none of the above could plausibly be the width of the bin.
- Which of the following intervals include the median of this distribution?
a)30 to 40
b)20 to 30
c)15 to 25
d)cannot be determined from the information given
Use the following box plot of the exam scores in a statistics class to answer questions 8-10. The boxplot is drawn per Moore (e.g. not per StataQuest).
90
75
60
45
30
- Approximately 25% of the students scored below
a) 90b) 65c) 75d) 60
- The interquartile range of the exam scores is approximately
a) 14b) 55c) 65d) 5
- The maximum exam score is approximately
a) 75b) 60c) 65d) 90
- For the density curve below, which of the following is true?
0.00.25 0.50 0.75 1.00
a) the density curve is symmetricb) the median is 0.5
c) the mean is 0.5d) all of the above
- If removing an observation from a data set would have a marked change on the position of the regression line fit to the data, the point is called
a) robustb) a residualc) influentiald) a response
- Using data from the fifty states, a researcher calculates the correlation coefficient between the infant mortality rate (deaths per 1000) X in 1990 in the state versus the percent 18 year olds in the state in 1990 that graduated from high school. The correlation between X and Y is r = -0.54. If instead of plotting these variables for each of the fifty states, we plotted the values of these variables for each county in the United States, we would expect the value of the correlation r to be
a)exactly the same
b)smaller (closer to zero)
c)+ 0.54 (the magnitude is the same, but the sign should change)
d)higher (closer to –1)
- Consider the following scatterplot.
Y
60
40
20
X
15 20 25 30
The correlation between X and Y
a)is approximately 0.999
b)is approximately 0.8
c)is approximately 0.0
d)cannot be computed because there is an outlier in the plot
- In a statistics class with 136 students, the professor records how much money each student has in her or his possession during the first class of the semester. The histogram below shows the data collected.
50
40
30
20
10
0 10 20 30 40 50 60 70 80 90 100
Amount of Money in $
From the histogram, which of the following is true?
a)The mean is much larger than the median.
b)The mean is much smaller than the median.
c)The mean and the median are approximately equal.
d)If is impossible to compare the mean and the median for these data.
- X and Y are two categorical variables. The best way to determine if there is a relation between them is to
a)calculate the correlation between X and Y.
b)draw a scatterplot of the X and Y values
c)make a two-way table of the X and Y values
d)all of the above
- A study of the salaries of full professors at Upper Wabash Tech shows that the median salary for female professors is considerably less than the median male salary. Further investigation shows that the median salaries for male and female full professors are about the same in every department (English, Physics, etc.) of the university. This apparent contradiction is an example of
a)extrapolation
b)Simpson’s paradox
c)Causation
d)Correlation
Part III. Free Response
Answer all questions. In some cases, we will award partial credit for correct parts of a problem even if the final answer is incorrect. Partial credit will only be given for work that is seen as a step toward the (correct) final answer. Random facts relating to the problem will not get partial credit. To get partial credit, you need to show your work.
- Below are date from Fortune Magazine on the number of research centers in 10 American cities.
cityrschctrs
1. Memphis85
2. Denver302
3. Indianapolis69
4. Los Angeles515
5. Phoenix121
6. San Francisco345
7. Detroit361
8. Minneapolis235
9. Seattle153
10. Orlando33
Use this data to answer the questions below:
- What is the five number summary for this data? (5 points)
- What is the interquartile range? (2 points)
- If we were to delete Los Angeles from the data, which would change more, the standard deviation of the variable or the interquartile range? (2 points)
- Below is a stem-and-leaf plot of the percentage of the population Christian among states in the Northeastern region of the United States. (Hint: the minimum is 36% Christian).
Stem-and-leaf plot for pctchris
3 | 6 9
4 |
5 | 5 9 9
6 | 1
- What is the mean for this data? (3 points)
- What is the standard deviation for this data? (3 points)
- Normal distribution problems
- What proportion of the area under the standard normal curve falls to the right of z = -.5? (3 points)
- What proportion of the observations of a standard normal distribution falls between –1 and 1 standard deviations from the mean? (3 points)
- A social psychologist has developed a test to measure gregariousness. The test is normed so that it has a mean of 70 and a standard deviation of 20, and the gregariousness scores are normally distributed. What percentage of scores are above 105? (4 points)
- Scores on the California test of basic skills are normally distributed with mean 50 and standard deviation 25. What is the lowest score you would need on the California test of basic skills to be in the top 20% of all scores? (4 points)
- On the California test of basic skills (mean 50 and standard deviation 25) what percentage of scores fall between 35 and 50? (4 points)
- The graph below is a histogram drawn in StataQuest of age at first marriage for 296 married persons in the general social survey. Each bin is two years wide.
.2