Learning Problems Ch. 2
2.1 Independent and dependent variables. In each of the following situations, is it more reasonable to simply explore the relationship between the two variables or to view one of the variables as an explanatory variable and the other as a response variable? In the latter case, which is the explanatory variable and which is the response variable?
(a) The amount of time a student spends studying for a statistics exam and the grade on the exam
(b) The weight and height of a person
(c) The amount of yearly rainfall and the yield of a crop
(d) An employee’s salary and number of sick days used
(e) The economic class of a father and of a son
2.5 Your mileage may vary.Figure2.2 plots the city and highway fuel consumption of 2001 model two-seater cars, from the Environmental Protection Agency’s Model Year 2001 Fuel Economy Guide.
FIGURE2.2 City and highway fuel consumption for 2001 model two-seater cars, for Exercise2.5.
(a) There is one unusual observation, the Honda Insight. What are the approximate city and highway gas mileages for this car?
(b) Describe the pattern of the relationship between city and highway mileage. Explain why you might expect a relationship with this pattern.
(c) Does the Honda Insight observation fit the overall relationship portrayed by the other two-seater cars plotted?
2.27 Thinking about correlation.Figure2.7 (page106) is a scatterplot of percent decline versus duration in months for 15 bear markets.
(a) Is the correlation r for these data near −1, clearly negative but not near −1, near 0, clearly positive but not near 1, or near 1? Explain your answer.
(b) Figure2.2 (page98) shows the highway and city gas mileage for 2001 model two-seater cars. Is the correlation here closer to 1 than that for Figure2.7 or closer to 0? Explain your answer.
2.31 Coffee and deforestation. Coffee is a leading export from several developing countries. When coffee prices are high, farmers often clear forest to plant more coffee trees. Here are data for five years on prices paid to coffee growers in Indonesia and the rate of deforestation in a national park that lies in a coffee-producing region:15
(a) Make a scatterplot. Which is the explanatory variable? What kind of pattern does your plot show?
(b) Find the correlation r step-by-step. That is, find the mean and standard deviation of the two variables. Then find the five standardized values for each variable and use the formula for r. Explain how your value for r matches your graph in (a).
(c) Next, enter these data into your calculator or software and use the correlation function to find r. Check that you get the same result as in (b), up to round-off error.
2.52 Doctors and poverty. We might expect states with more poverty to have fewer doctors. Table1.9 (page28) gives data on the percent of each state’s residents living below the poverty line and on the number of doctors per 100,000 residents in each state.
(a) Make a scatterplot and calculate a regression line suitable for predicting doctors per 100,000 residents from poverty rate. Draw the line on your plot. Surprise: the slope is positive, so poverty and doctors go up together.
(b) The District of Columbia is an outlier, with both very many doctors and a high poverty rate. (D.C. is a city rather than a state.) Circle the point for D.C. on your plot and explain why this point may strongly influence the least-squares line.
(c) Calculate the regression line for the 50 states, omitting D.C. Add the new line to your scatterplot. Was this point highly influential? Does the number of doctors now go down with increasing poverty, as we initially expected?
2.88 The declining farm population. The number of people living on American farms has declined steadily during the 20th century. Here are data on farm population (millions of persons) from 1935 to 1980:
(a) Make a scatterplot of these data and find the least-squares regression line of farm population on year.
(b) According to the regression line, how much did the farm population decline each year on the average during this period? What percent of the observed variation in farm population is accounted for by linear change over time?
(c) Use the regression equation to predict the number of people living on farms in 1990. Is this result reasonable? Why?
2.108 Marital status. Give the marginal distribution of marital status (in percents) for the men in the study of Case 2.2, starting from the counts in Table2.10.