EPE/EDP 660 Exam 3Dr. Kelly Bradley

{2 points}Name

You MUST workalone– no tutors; no help from classmates. Emailmeorsee mewith questions. You will receive a scoreof0 ifthis ruleis violated.Minitab(orotherapprovedsoftware)output, session window in Minitab,mustbeincluded{2points}.Answers must be clearly labeled. If using Minitab, the session window should be included. DoNOT includea copyof theworksheet.In order to receive partial credit, work must be shown.

PART A (18POINTS): FILL IN THE BLANK(with best choice) {2points perblank}

(1)The number of levels of a quantitative variable must be at least ______more than the order of the polynomial x that you want to fit.

(2)When 2 or more independent variables are moderately to highly correlated with each other, it

is reasonable to suspect an issue with ______.

(3)Predicting y when the x values are outside the range of experimentation is

______.

(4)______Regression is a screening method that starts with no predictors. Each of the available predictors is evaluated with respect to how much R2 would be increased by adding it to the model.

(5)The ______can be regarded as a random sample from a N(0,σ2) distribution, so we can check this assumption by checking whether the residuals might have come from a normal distribution.

(6)To fit a straight line, you need at least______different x values, and to fit a curve you need

at least______.

(7)An observation that is larger than 2 or 3s is a/n ______.

(8)In ______regression, the β parameter is interpreted as the percentage change in odds for every 1-unit increase in xi holding all other x’s fixed.

PART B: Short Answer(23POINTS)

(1)In addition to independent or predictor variables being highly correlated, how can you assess if multicollinearity is present? Explain. {4 points}

(2)Considering the regression setting, list the assumptions about ε? If assumptions do not hold, what are the potential consequences? Explain.{6 points)

(3)Why would we want to use the standardized β coefficients over the regular β coefficients? Explain. {4 points}

(4)What is meant by Parsimony in regards to regression models? Are there times when it is good and bad? Explain. {3 points}

(5)What is the difference between homoscedastic and heteroscedastic and which is preferable? {3 points}

(6)How is logistic regression unique in terms of the dependent variable? In what way/s is this helpful? {3 points}

PART C: Data Analysis (55 points)

A college dean desires to estimate students’ GPA after their first semester. The dean takes a random sample of 94 freshmen currently enrolled at the college and records their ACT scores, listed by section and as a comprehensive score (ACT Composite), and High School GPA (HS GPA). The table below contains data for the sample (Only the first 6 observations are presented. Use the full data set for your analysis). *Adequate ACT (highlighted in table) is discussed and used in item (i).

Term GPA / HS GPA / ACT English / ACT Math / ACT Reading / ACT Science / ACT Composite / Adequate ACT
4.00 / 3.93 / 25 / 24 / 24 / 24 / 24 / 1
1.83 / 3.37 / 23 / 21 / 21 / 24 / 22 / 1
3.79 / 3.92 / 29 / 18 / 31 / 22 / 25 / 1
3.44 / 4.00 / 28 / 30 / 26 / 26 / 28 / 1
4.00 / 4.00 / 27 / 25 / 30 / 23 / 26 / 1
3.31 / 3.19 / 16 / 18 / 18 / 16 / 17 / 0

(1)Produce a graphical summary for the y-variable, Term GPA. Describe the general distribution of y, include discussion of central tendency and variability. {2 points}

(2)Produce descriptive statistics for all potential predictor (independent variables). HS GPA-ACT Comp, exclude Adequate ACT.At a minimum, include mean, median, standard deviation, and range. Describe general trends, distributions, etc. {3 points}

(3)Produce a correlation matrix and matrix plot of all the variables, excluding Adequate ACT. Do you see any “strong” correlations? Defend.{3 points}

(4)Compute the regression equation, R-square, VIF, standardized coefficient estimates, and standardized residuals, for the regression model with all potential independent variables as predictors of Term GPA. (Include the ANOVA table). Submit your 1st 8 rows of the Minitab worksheet for this item.{5 points}

  1. What is the R-square and R-square (adjusted)? What does this tell us? {3 points}
  1. Overall, do you feel this is a reasonable model? Defend. {3 points}

(5)Conduct a Stepwise regression analysis of the data. List the best equation. Why is it your choice? (Be sure to explain how the decision was made.) Defend. {5 points}

(6)Conduct a Backward Elimination regression analysis of the data. Why is it your choice? (Be sure to explain how the decision was made.) Defend. {5 points}

(7)Conduct a Best Subsets regression analysis, include PRESS, of the data. Discuss the results. {3 points}

(8)Now, compute the regression equation for estimating Term GPA as a function of ACT comp and HS GPA. Include VIF, standardized coefficient estimates, and standardized residuals, along with the ANOVA table.{3 points}

  1. Check the Assumptions of Regression. Be sure that you have produced the standardized residuals 4 in 1 plot, or constructed the appropriate plots. {4 points}
  1. Produce Hi(leverages), Cook’s Distance, and DFITS. Explore, identify, and discuss outliers and leverage points. {4 points}

(9)A new variable was computed in in C8, Adequate ACT. If ACT Comp score is greater than 20 then Acceptable = 1, if not then Acceptable = 0. Produce a binary logistic regression model to predict an Adequate ACT score as a function of High School GPA {2 points}.

1 | Page

  1. Report the maximum likelihood values of the estimates.{2 points}
  1. Report the odds-ratios and compute the percent increase or decrease in the estimate of odds of Adequate ACT. {2 points}
  1. Test the overall adequacy of the model. Be sure to report the test statistic and p-value. {3 points}

(10)From all the regression equations produced above, which do you feel is most desirable? Write the equation. Defend your choice. {3 points}

1 | Page