Lab Activity 12 – Inference About Regression (57 points total)

Dataset: College Scorecard Mideast

1. (7 pts.) Use software to create a scatterplot of Y = 2013 cost of school vs. X = 2010 cost of school for Mideast doctoral schools. Paste the plot below and then use a sentence to summarize the relationship. The sentence should include information on the direction, form, and strength of the relationship. (See Section 12.1 of the online notes.)

2. (3 pts) Find the Pearson correlation (r) of Y = 2013 cost of school and X = 2010 cost of school using software and then paste the output below.

  1. (4 pts) Does it match your statement about the scatterplot from problem 3? Explain why or why not based on the correlation’s sign and magnitude.
  1. (3 pts) Use the correlation to find the amount of observed variation in the 2013 cost of school that can be explained by the 2010 cost of school (i.e. find the coefficient of determination). (See Section 12.9 of the online notes.)

3. (4 pts) Fit a simple linear regression model of Y = 2013 cost of school vs. X = 2010 cost of school, and include the residual plots. Paste the output below, and then answer the following questions.

  1. (3 pts) Locate the residuals versus fits plot. Based on this plot, do you believe that the constant variance assumption holds?

Hint: We assume constant variance if there is no clear pattern (randomly scattered about 0) in the points in the residual versus fits plot. If we see a pattern, such as fanning, this would indicate that the assumption is violated.

  1. (3 pts) Locate the histogram of the residuals. Based on this plot, do you believe that the normality assumption holds?

Hint: We assume normality if the histogram is approximately bell-shaped.

  1. (5 pts) To determine if there is a significant relationship between the 2010 cost of school (X) and the 2013 cost of school (Y), we need to test

H0: β1 = 0 (slope of ‘Cost_of_school_2010’ is 0)

vs.

Ha: β1 ≠ 0 (slope of ‘Cost_of_school_2010’ is different from 0)

In the Coefficient table, find the test statistic (T-Value) and p-value for this test. Based on this information, is the cost of school in 2010 a significant predictor of the cost in 2013? Why or why not?

T-Value =

p-value =

Conclusion:

  1. (3 pts) Write the estimated regression equation.
  1. (5 pts) Fill in the blanks to interpret the regression equation slope in context of the data: “For every one dollar increase in ______, the mean ______is estimated to (increase/decrease) by ______dollars.”

Note: You need to select either “increase” or “decrease” for the portion in blue. See Section 12.4 of the online notes.

  1. (3 pts) By hand, use the regression equation to find the expected (fitted) 2013 cost of school when the cost in 2010 was $28,500. Show your work.
  1. (2 pts) Suppose that a certain Mideast doctoral institution that had an estimated $28,500 cost of attendance in 2010, had a cost of $34,500 in 2013. What is the residual of this point?

Hint: Residual = Observed value of Y – Fitted value of Y

  1. (4 pts) Find the 95% confidence and prediction intervals for the 2013 cost of school when the 2010 cost was $28,500 using software. Provide the output and then answer the following questions. (See Section 12.8 of the online notes.)
  1. (4 pts) Write a sentence to interpret the 95% confidence interval in context.
  1. (4 pts) Write a sentence to interpret the 95% prediction interval in context.