EPE/EDP 660 Exam 3Dr. Kelly Bradley
{2 points}Name
You MUST workalone– no tutors; no help from classmates. Emailmeorsee mewith questions. You will receive a scoreof0 ifthis ruleis violated.Minitab(orotherapprovedsoftware)output, session window in Minitab,mustbeincluded{2points}.Answers must be clearly labeled. If using Minitab, the session window should be included. DoNOT includea copyof theworksheet.In order to receive partial credit, work must be shown.
PART A (18POINTS): FILL IN THE BLANK(with best choice) {2points perblank}
(1)The number of levels of a quantitative variable must be at least ______more than the order of the polynomial x that you want to fit.
(2)When 2 or more independent variables are moderately to highly correlated with each other, it
is reasonable to suspect an issue with ______.
(3)Predicting y when the x values are outside the range of experimentation is
______.
(4)______Regression is a screening method that starts with no predictors. Each of the available predictors is evaluated with respect to how much R2 would be increased by adding it to the model.
(5)The ______can be regarded as a random sample from a N(0,σ2) distribution, so we can check this assumption by checking whether the residuals might have come from a normal distribution.
(6)To fit a straight line, you need at least______different x values, and to fit a curve you need
at least______.
(7)An observation that is larger than 2 or 3s is a/n ______.
(8)In ______regression, the β parameter is interpreted as the percentage change in odds for every 1-unit increase in xi holding all other x’s fixed.
PART B: Short Answer(23POINTS)
(1)In addition to independent or predictor variables being highly correlated, how can you assess if multicollinearity is present? Explain. {4 points}
(2)Considering the regression setting, list the assumptions about ε? If assumptions do not hold, what are the potential consequences? Explain.{6 points)
(3)Why would we want to use the standardized β coefficients over the regular β coefficients? Explain. {4 points}
(4)What is meant by Parsimony in regards to regression models? Are there times when it is good and bad? Explain. {3 points}
(5)What is the difference between homoscedastic and heteroscedastic and which is preferable? {3 points}
(6)How is logistic regression unique in terms of the dependent variable? In what way/s is this helpful? {3 points}
PART C: Data Analysis (55 points)
A college dean desires to estimate students’ GPA after their first semester. The dean takes a random sample of 94 freshmen currently enrolled at the college and records their ACT scores, listed by section and as a comprehensive score (ACT Composite), and High School GPA (HS GPA). The table below contains data for the sample (Only the first 6 observations are presented. Use the full data set for your analysis). *Adequate ACT (highlighted in table) is discussed and used in item (i).
Term GPA / HS GPA / ACT English / ACT Math / ACT Reading / ACT Science / ACT Composite / Adequate ACT4.00 / 3.93 / 25 / 24 / 24 / 24 / 24 / 1
1.83 / 3.37 / 23 / 21 / 21 / 24 / 22 / 1
3.79 / 3.92 / 29 / 18 / 31 / 22 / 25 / 1
3.44 / 4.00 / 28 / 30 / 26 / 26 / 28 / 1
4.00 / 4.00 / 27 / 25 / 30 / 23 / 26 / 1
3.31 / 3.19 / 16 / 18 / 18 / 16 / 17 / 0
(1)Produce a graphical summary for the y-variable, Term GPA. Describe the general distribution of y, include discussion of central tendency and variability. {2 points}
(2)Produce descriptive statistics for all potential predictor (independent variables). HS GPA-ACT Comp, exclude Adequate ACT.At a minimum, include mean, median, standard deviation, and range. Describe general trends, distributions, etc. {3 points}
(3)Produce a correlation matrix and matrix plot of all the variables, excluding Adequate ACT. Do you see any “strong” correlations? Defend.{3 points}
(4)Compute the regression equation, R-square, VIF, standardized coefficient estimates, and standardized residuals, for the regression model with all potential independent variables as predictors of Term GPA. (Include the ANOVA table). Submit your 1st 8 rows of the Minitab worksheet for this item.{5 points}
- What is the R-square and R-square (adjusted)? What does this tell us? {3 points}
- Overall, do you feel this is a reasonable model? Defend. {3 points}
(5)Conduct a Stepwise regression analysis of the data. List the best equation. Why is it your choice? (Be sure to explain how the decision was made.) Defend. {5 points}
(6)Conduct a Backward Elimination regression analysis of the data. Why is it your choice? (Be sure to explain how the decision was made.) Defend. {5 points}
(7)Conduct a Best Subsets regression analysis, include PRESS, of the data. Discuss the results. {3 points}
(8)Now, compute the regression equation for estimating Term GPA as a function of ACT comp and HS GPA. Include VIF, standardized coefficient estimates, and standardized residuals, along with the ANOVA table.{3 points}
- Check the Assumptions of Regression. Be sure that you have produced the standardized residuals 4 in 1 plot, or constructed the appropriate plots. {4 points}
- Produce Hi(leverages), Cook’s Distance, and DFITS. Explore, identify, and discuss outliers and leverage points. {4 points}
(9)A new variable was computed in in C8, Adequate ACT. If ACT Comp score is greater than 20 then Acceptable = 1, if not then Acceptable = 0. Produce a binary logistic regression model to predict an Adequate ACT score as a function of High School GPA {2 points}.
1 | Page
- Report the maximum likelihood values of the estimates.{2 points}
- Report the odds-ratios and compute the percent increase or decrease in the estimate of odds of Adequate ACT. {2 points}
- Test the overall adequacy of the model. Be sure to report the test statistic and p-value. {3 points}
(10)From all the regression equations produced above, which do you feel is most desirable? Write the equation. Defend your choice. {3 points}
1 | Page